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Abstract 



We discuss new methods for the recovery of signals with block-sparse structure, based on i'l-minimization. 
Our emphasis is on verifiable conditions on the problem parameters (sensing matrix and the block struc- 
ture) for accurate recovery and efficiently computable bounds for the recovery error. These bounds are 
then optimized with respect to the method parameters to construct the estimators with improved statisti- 
cal properties. To justify the proposed approach we provide an oracle inequality which links the properties 
of the recovery algorithms and the best estimation performance. We also propose a new matching pursuit 
algorithm for block-sparse recovery. 



1 Introduction 

The problem we consider in this paper is to estimate a hnear transform Bx G of a vector a; G M' 
from the observations 



Here A is a given m x n sensing matrix, S is a given N x n matrix, and u + ^ is the observation error; in 
this error, u is an unknown nuisance known to belong to a given compact convex set U C symmetric 
w.r.t. the origin, and ^ is random noise with known distribution P. 

We assume that the space M where Bx lives is represented as = R"i X ... X W'^, so that a vector 
w G M.'^ is a block vector: w = [w[l]; ...;w[K]] with blocks w[k] G M"*, 1 < k < K. In particular, 
Bx = [B[l]x; S[iC]a;] with Uk x n matrices B[k], 1 < k < K. While we do not assume that the vector x 
is sparse in the usual sense, we do assume that the linear transform Bx to be estimated is s-block sparse, 
meaning that at most a given number s of the blocks B[k]x, 1 < k < K , are nonzero. 

The recovery routines we intend to consider are based on block-ii minimization, i.e., the estimate w(y) 
of w = Bx is B'z{y), where z(y) is obtained by minimizing the norm 'Ylk=i over signals 2; G 

with Az "fitting," in certain precise sense, the observations y. Above, || • are given in advance norms on 
the spaces M"*= where the blocks of Bx take their values. 

In the sequel we refer to the given in advance collection {B,ni, ...,nK,\\ ■ ■ \\{K)) as to the 

representation structure (r.s.). Given such a structure and A, our ultimate goal is to understand how well 
can we recover the s-block-sparse transform Bx by appropriately implemented block ii minimization. 
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y = Ax + u + ^. 




Related Compressed Sensing research Our situation and goal form a straightforward extension of the 
usual sparsity/block sparsity Compressed Sensing framework. Indeed, the standard representation structure 
with B = In, = 1, and || • ||(fc) = | • I? 1 ^ ^ ^ = ''^^ leads to the standard Compressed Sensing setting - 



recovering a sparse signal x S M" from its noisy observations ( 1.1 ) via li minimization. The case of nontrivial 
block structure {n^, || • and B = I is generally referred to as block-sparse, and has been considered in 

numerous recent papers. Specifically, there seem to be a number of applications where block-sparsity (with 
B = In) arises naturally (see, e.g., [16j and references therein), such as multi-band signals, measurements 
of gene expression levels, or estimation of multiple measurement vectors sharing a joint sparsity pattern, 
among many others. Several methods of estimation and selection extending the "plain" ^i-minimization to 
block sparsity were proposed and investigated recently. Most of the related research focused so far on block 
regularization schemes — group Lasso recovery 



K 



x{y) £ Argmin ^ \\Az — ?/||| + A^"^ 



fe=l 



(here || • ||2 is the Euclidean norm of the block). In particular, the literature on "plain Lasso" (the case of 
nfc = l, 1 < k < K = n) has a important counterpart on group Lasso, see, e.g., [H HI [T^l dH [ISl EHl ESI O 
EH EZl ESI EHl EOl EU [33], and references therein. Another celebrated technique of sparse recovery, Dantzig 
selector, originating from [9], has also received its counterpart for recovery of block-sparse signals, which is 
dealt with in |20l |25] . Most of the cited papers focus on bounding recovery errors in terms of magnitude 
of the observation noise and "s-concentration" of the true signal x (the distance from the space of signals 
with at most s nonzero blocks — the sum of magnitudes ||a;[fc]||2 of all but the s largest in magnitude 
blocks in x.). Typically, these results rely on natural block analogy ("Block RIP," see, e.g., [16j) of the 
celebrated Restricted Isometry Property introduced by Candes and Tao [HIIIO], or on block analogies [26] 
of the Restricted Eigenvalue Property introduced in |6| . 



Contributions of this paper The first (by itself, minor) novelty in our problem setting is the presence 
of the linear mapping B. We are not aware of any preceding work handling the case of a "nontrivial" (i.e., 
different from the identity) B. We qualify this novelty as minor, since in fact the case of a nontrivial B 
can be reduced to the one of i? = /[^ However, "can be reduced" is not the same as "should be reduced," 
since nontrivial i?'s arise naturally in many applications. This is the case, e.g., when x is the solution of 
a linear finite-difference equation with sparse right hand side ("evolution of a linear plant corrected from 
time to time by impulse control"), where B is the matrix of the corresponding finite-difference operator. 
We believe that introducing B adds some useful flexibility (and as a matter of fact costs nothing, as far as 
the theoretical analysis is concerned). 

We believe, however, that the major novelty in what follows is the emphasis on veriEahle conditions on 
A and the r.s. which guarantee good recovery of the transform Bx from noisy observations of Ax, provided 
that the transform in question is nearly s-block sparse, and the observation noise is low. Note that such 
guarantees cannot be obtained from the "classical" conditions used when studying theoretical properties of 
block-sparse recovery (with a notable exception of the Mutual Block-Incoherence condition of [E]). The 
latter means that given the matrix A, one cannot answer in any reasonable time if the (Block-) Restricted 
Isometry or Restricted Eigenvalue property hold with given parameters. While the efficient verifiability is 
by no means necessary for a condition to be meaningful and useful, we believe also that verifiability has 
its value and is worthy of being investigated. In particular, the verifiability allows to design new recovery 
routines with explicit confidence bounds for the recovery error and then optimize these bounds with respect 
to the parameters of the recovery. In this respect, the current work extends the results of |23 | 12 ^ 122]. where 

^Assuming, e.g., that x h-)- Bx is an "onto" mapping, we can treat Bx as our signal, the observations being Py, where 
P is the projector onto the orthogonal complement to the linear subspace A ■ KerB in R™; with y — Ax + u + 5, we have 
Py — GBx + P{u + ^) with an explicitly given matrix G. 
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^1-recovery of the "usual" sparse vectors was considered (in the first two papers - in the case of uncertain- 
but-bounded observation errors, and in the third ~ in the case of Gaussian observation noise). Precisely, we 
propose here new routines of block-sparse recovery which explicitly utilize the verifiability certificate - the 
contrast matrix, and show how these routines may be tuned to attain the best performance bounds. 

To give an impression of what will follow, we present here a short summary of our major results. To 
streamline this summary, we restrict ourselves for the time being with the case where (a) the random noise ^ 
in (|l.l|) is Gaussian: ^ ~ AA(0, cj^ Im) with known a"^ > 0, and (b) all the norms || • are just || • H^-norms, 
with the value of r common for all 1 < k < K. Let s be a given positive integer — an a priori upper bound 
on the number of nonzero blocks in the transforms we intend to recover well, and e ^ 1 be the a 

given tolerance. We fix an m x n sensing matrix A and an r.s. {B,ni, ...,nK, \\ ■ \\r, || • ||r)- 



Condition Q^^^ Given s and q G [1, oo], we introduce a condition Cls,q on an m x N contrast matrix H, 
specifically, the condition 

V(x G M") : Ls,q{Bx) < s'^L^oiH^Ax) + ls'^'^Li{Bx) 

where for w = [w[l]; ...;w[K]] G and p G [l,oo], Lp{w) = \\[\\w[l\\\r; ■■■;\\w[K]\\r]\\p is the norm of w; 
Ls,p{w) is the norm of w obtained as follows: we zero out all but the s largest in "magnitude" \\w[k] \\r blocks 
in w, and take the Lp-norm of the resulting s-block-sparse vector. For example, Ls^oq{vli) is, independently 
of s, the maximum of magnitudes of blocks in w. 

Recovery routines Given an e > and an m x contrast matrix H = [h}, ...,h^], we introduce two 
recovery routines: regular Li recovery (cf. (block-) Dantzig selector) 



(y) G Argmin {Li(Bz) : \\H^ {y - Az)\\^ < u{H)} , 



viH) = max 



max u + c7Erfinv( — — ) 1 1 /i-' 1 1 2 
u&A IN 



(1.2) 



Erfinv(-) being the inverse error functiorj^ and penalized Li recovery (cf. (block-) Lasso) 

Xpcniy) G Argmin [Li{Bz) + 2s\\H^{y - Az)\\^] . 

Note that the regular Li recovery can be undefined; this happens when the corresponding optimization 
problem is infeasible. The penalized recovery always is well defined. 



Error bounds for regular and penalized recoveries Our main related result is as follows (see Theo- 
3.3): Let a contrast matrix H satisfy the condition Qs,q- Then there exists a set S of realizations 



rems 



3.1 



of ^ such that Probj.^ G H} > 1 — e and for all ^ £ E, x £ M" and u £ U, XregiAx + u + ^) is well dehned, 
and for both x = Xreg(Ax + u + ^) and x = Xpen{Ax + u + S,) one has 

\/p G [1, q] : Lp{Bx - Bx) < 4{2s)p [i^(H) + s-'^VsiBx)] (1.3) 

where Vs{w) is the "s-concentration of w," that is, the sum of magnitudes of all but the s largest 

in magnitude blocks in w. Note that for the case of the standard r.s., the corresponding constructions and 
results were developed in [22] . 



Erfinv(5) means that ^ e"*'''^dt = S. 
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Verifiable sufficient condition for Qs ^ and contrast optimization Similarly to the plain and block 
Restricted Isometry /Eigenvalue Properties, condition Cls,q is computationally intractable. In other words, 
given a candidate contrast matrix H, it is difficult to verify whether it satisfies or does not satisfy Qs,q- 
We, however, can point out a veriEable sufRcient condition for H to satisfy Qs,q- Specifically, we show 
(Proposition 5.1) that H definitely satisfies Qs,q, if there exists a N x N matrix V (wliich we treat as a 
K X K block matrix with x blocks V^^) such that 



B = VB + H^A, and (b) : \\[\\V 



\V' 



21 \ 



\v 



< 



,81 



:i.4) 



where \\V \\r,r = max^gKi^? {\\V^% : \\u\\r < I}, and \\ u\\s,p is the norm on M defined as follows: we zero 
out all but the s largest in magnitude entries in vector u, and take the \\ ■ \\p-norm of the resulting vector. 
One can use the above sufficient condition in order to build a "quasi-optimal" contrast matrix, specifically. 



by minimizing y{H), defined in (1.2), over pairs iV^H) satisfying the system of convex constraints (1.4) 



(provided, of course, that this system of constraints is feasible). The resulting problem is computationally 
tractable, provided that the matrix norms || • \\r^r are efficiently computable, which indeed is the case when 
r = 1, or r = 2, or r = oo. 



Verifiable sufficient condition in the case q = co In general, the proposed verifiable (at least for 
r G {l,2,oo}) sufficient condition for H to satisfy Qs^g is not necessary, and the condition Cls,q itself seems 
to be intractable. There exists, however, a notable exception - this is the case of g = oo and r = oo. We 



show (Proposition 4.1 ) that here the verifiable sufficient condition is necessary and sufficient for H to satisfy 
Qs^oo- Moreover, the latter condition is "fully computationally tractable," meaning that one can optimize 
efficiently the quantity i^{H) over the contrast matrices H satisfying Qs,oo, thus ending up with an optimal. 



as far as the error bound (1.3) is concerned, recovery routines. Note that when q = oo, the bound (1.3) 



holds true in the largest possible range 1 < p < oo of values of p. 



In the case of the standard r.s., the sufficient condition (1.4) reduces to the verifiable sufficient condition 
for the validity of ii recovery established in [23] ■ As we have mentioned above, the only known so far 
verifiable sufficient condition for the validity of block i\ recovery of block-sparse signals is the Mutual 
Block- Incoherence condition (cf. fT5] and [T7]) dealing with the case oiB = I and r = 2. This condition is a 



block analogy of the usual mutual incoherence condition originating from [13]. We show in Section 5.4 that 
the Mutual Block-Incoherence condition is "covered" by the case of = /, r = 2 of the verifiable condition 



(1.4). 



Oracle inequality in the case q = oo, r = oo As the majority of good error bounds in Compressed 
Sensing, the error bound (1.3) expresses the following quite intuitive fact. Imagine that instead of indirect 

Bx^ we were observing this transform directly with noise: y = u) + 

e one has Loo{C) < i^- It is easily seen that 
e) -reliable bound on 



observation (1.1) of a transform w 



Here the observation error ( is such that with probability > 1 

in the latter case, in the range Vs{w) < sv^H) of s-concentrations of w, the best (1 
the Lp{-)-norm of the recovery error of w coincides, within an absolute constant factor, with the right hand 
side of (|1.3|). Thus, a natural interpretation of the error bound (1.3) is that as far as recovery of transforms 



Bx with s-concentration Vs{Bx) < si'{H) is concerned, everything is as if we were given a direct observation 
of Bx contaminated with a noise of typical Loo-magnitude < i^{H). One of the main results of this paper 
is that, to some extent, the opposite also is true, provided that r = oo and (1.3) holds true in the entire 



range 1 < p < oo of values of p. Specifically, we prove (see Proposition 4.2) the following. Let all the block 
norms be the \\ ■ \\oo-norms, and let the observation error be present (that is, either a > 0, or U contains a 
neighborhood of the origin). Let, further, for some integer S and positive v there exist a routine (an oracle) 
w{y) = Bx(y) for recovering Bx from observations (1.1) such that 



V(n e X e M" : vsiBx) < Su) : Probg^_v(0,/){^oo(Sk - x{Ax + u + a^] < + S~'^vs{Bx)} > 1 - e. 
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(cf. (1.3) with p = oo). Then for every integer s, 1 < s < ^ , there exists a contrast matrix H G 
a ''certificate" V = [V^^]^^^^^ G M^^^ such that 

B = VB + H^A, \\V''^\\oc,oo<-r,'^<k,e<K, and v{H) < v* := 2v 

4s Erfinv(2j 



pmxN 



and 



In other words, when e is small, the condition (1.4) is satisfied by an appropriate H for all s in the range 
[1, s*], such that s* and v{H) coincide, within some absolute constant factors, with S and u, respectively. 
All proofs are placed in the Appendix. 



2 Problem statement 

Notation. In the sequel, we deal with 

• signals - vectors x = [xi; x„] G M", and a m x n sensing matrix A; 

• representations of signals - block vectors w = [w[l]; ...-jwlK]] G W := 1^^;^] x ... x and the 
representation matrix B = [B[l]; B[k] G R"':^"; the representation of a signal x G M" is the 
block vector w = Bx with the blocks 

From now on, the dimension of W is denoted by N: 

N = ni + ... + UK- 

The factors R"'= of the representation space W are equipped with norms || • ||(^,); the conjugate norms are 
denoted by || • A vector w = [u)[l]; 7i;[-fC]] from W is called s-block-sparse, if the number of nonzero 

blocks w[k] G M"'' in w is at most s. A vector x G M" will be called s-block-sparse, if its representation Bx 
is so. We refer to the collection {B,ni, ...,nK, \\ ■ ||(i), || • \\(k)) as the representation structure (r.s. for 
short). 

For w G W, we call the number the magnitude of the k-th block in w, and denote by w'^ the 

representation vector obtained from w by zeroing out all but the s largest in magnitude blocks in w (with the 
ties resolved arbitrarily). For / C {1, ...,K} and a representation vector w, wj denotes the vector obtained 
from w by keeping intact the blocks w[k] with k ^ I and zeroing out all remaining blocks. For G W and 
1 < p < oo, we denote by Lp{w) the || • ||p-norm of the vector ||(i); so that Lp{-) is a 

norm on W with the conjugate norm L*{w) = P* = Given positive 

integer s < K, we set Ls,p{w) = Lp{w^). Note that Ls,p{-) is a norm on W. 



Problem of interest is as follows: given an observation 

y = Ax + u + i, (2.5) 

of unknown signal x G W^, we want to recover the representation Bx of x, knowing in advance that this 
representation is "nearly s-block-sparse," that is, the representation can be approximated by an s-block- 
sparse one; the Li-error of this approximation will enter our error bounds. 



In (2.5), the term ?/ + ^ is the observation error; in this error, u is an unknown nuisance known to belong 
to a given compact convex set U C M*" symmetric w.r.t. the origin, and is random noise with known 
distribution P. 
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Condition Cls,qii^) We start with introducing tiie condition wliich will be instrumental in all subsequent 
constructions and results. Let a sensing matrix A and an r.s. S = {B,ni, ...,nK, \\ • ||(i), || • \\{K)) be given, 
and let s < K he Sl positive integer, q G [1, oo] and k > 0. We say that a pair {H, \\ ■ ||), where H G M™-^-^ 
and II • II is a norm on R^^, satisfies the condition Qs^g(K) associated with the matrices A,B and the r.s., if 

VxGM": Ls,q{Bx) < s^\H^Ax\\+ KS^'^Li{Bx). (2.6) 
The following observation is evident: 

Observation 2.1 Given A and an r.s. {B,ni, ...,nK,\\ ■ ||(i),...,|| • ||(x))7 {H,\\ ■ \\) satisfy Cls,q{i^)- 

Then {H, \\ ■ ||) satisfies Qs,g'(K') for all q' G and k' > n. Besides this, if s' < s is a positive integer, 

1 ' 1 

{{s/s')i H,\\-\\) satisfies Qs' ,q{{s' / s) ik). Further, if{H,\\-\\) satisfies Qs^qi^n) , q' > q, and k' and a positive 

integer s' are such that k'{s')i' > ksi , then [si{s') i' H, \\ ■ \\) satisfies Qs',q'W)- I"^ particular, when 

I—- - 
s' < s 1 , the fact that {H, \\ ■ \\) satisfies Cls,q{n) implies that {siH, \\ ■ ||) satisfies Qs',oo- 

Relation to known conditions for the validity of sparse ii recovery. Note that whenever 

{B,ni, UK, II • 11(1), II • \\{K)) 

is the standard r.s., meaning that B is the identity matrix, ni = ... = uk = 1 and || • ||(A;) = | • | for all k, the 
condition Qs,g(K) reduces to the condition Hg^qin) introduced in [22]. On the other hand, condition Qs,p{n) 
is closely related to known conditions, introduced to study the properties of recovery routines in the context 
of block-sparsity. Specifically, consider an r.s. with B = In, and let us make the following observation: 
Let (H, II • lloo) satisfy Qs^g(K) and let A be the maximum of the Euclidean norms of columns in H. Then 

VxGM": Ls,q{x) <Xs~^\\Ax\\2 + KS^~^Li{x). (2.7) 



Let us fix the r.s. 52 = (1^, ^i, nj^-, || • II2,..., || • lb)- Condition ( |2.7[ ) with k, < 1/2 plays crucial role 
in the performance analysis of group-Lasso and Dantzig Selector. For example, the error bounds for Lasso 
recovery obtained in [26] rely upon the Restricted Eigenvalue assumption RE(s, x) as follows: there is x > 
such that ^ 

^2(2;'') < — ||Ax||2 whenever 3Li(x*) > Li{x - x*"). 

In this case ^^^1(3;) < \/~sLs.2{x) < ^11^4x112 whenever ALs^i{x) > Li{x), so that 

Vx G M" : Ls i(x) < ||ylx||2 + jLUx) (2.8) 

X 



what is (2.7) with q = 1, k = 1/A and A = {>i:^/s)~^ (observe that (2.8) is nothing but the "block version" 
of the Compatibility condition from [7J). 

Recall that a sensing matrix A G M™^" satisfies the Block Restricted Isometry Property BRIP((5, A;) (see, 
e.g. |16| ) with 5 > and a positive k if for every x G M" with at most k non- vanishing blocks one has 

(1 - d)\\x\\l < x^A^Ax < (1 + 5)||x||^. (2.9) 

Proposition 2.1 Let A G R™-^" satisfy BRIP((5, 2s) for some 6 < I and positive integer s. Then 

(i) The pair (^H = -^^^=^=1^, || • ||2^ satisfies the condition Qs,2 (135) associated with A and the r.s. 82- 



ii) The pair I H = jtt^^, ioo(") I satisfies the condition Q,s,2 ( jti^ I associated with A and the r.s. S2. 



Our last observation here is as follows: let {H, || • ||) satisfy Qs.g(K;), the r.s. being [B, ni, ...,nK, \\ • II2 
and let d = maxfc n/j. Then {H, || • ||) satisfies Qs,q{VdK), the r.s. being {B, ni, nx, \\ ■ ||oo, || 



5 ' • ' ; 



00 I 
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3 Accuracy bounds for ii block recovery routines 

Throughout this section we fix an r.s. S = {B,ni, ...,nK, \\ ■ ||(i), || • \\(k)) ^ sensing matrix A. 

3.1 Regular ii recovery 

We define the regular li recovery as 



.rcg(y) G Argmin {Li{Bu) : \\H^iAu - y)\\ < p} , (3.10) 

u 

where the contrast matrix H € M"*^^^, the norm || • || and p > are parameters of the construction. 

Theorem 3.1 Let s be a positive integer, q G [l,oo], k G (0,1/2), and e € (0,1). Assume that the pair 
{H, II • II) satisfies the condition Qs,g(K) associated with A and r.s. S and that there exists a set H satisfying 
P{E) >l-e and 

\\H^{u + 0\\<py{u€U,^£E) (3.11) 



Then for all x £W\ u gU and G H one has 

Lp{B[Xreg{Ax + U + 0-x])< 



p+ Li{Bx-[Bx] 
Is 



, 1 <P< 



(3.12) 



The above result can be slightly strengthened by replacing the assumption that {H, \\ ■ \\) satisfies Qs^g(K), 



K < 1/2, with a weaker, by Observation 2.1 assumption that {H, \\ ■ ||) satisfies Qs^i(x) with x < 1/2 and 



satisfies Qs^q{K) with some (perhaps large) k: 

Theorem 3.2 Given A, r.s. S, integer s > 0, q G [l,oo] and e G (0, 1), assume that {H, \\ ■ \\) satisfies the 
condition Qs,i{^) with x < 1/2 and the condition Qs,qiK-) with some k > x, and let p be such that there 
exists a set H satisfying P(H) > 1 — e and 

||^^(n + e)|| < pV(u GZ^,e G S). 
Then for all x G M", tt G Z//, ^ G H and p, I < p < q, it holds: 



Lp{B[xrcg{Ax + u + C)-x]) < 



1 g(p-i) 
4(2s)p[l + K-x]p(9-i) 

1 - 2x 



p+-Li{Bx-[BxY) 
Is 



(3.13) 



3.2 Penalized ^l recovery 

Penalized i\ recovery is 

Spcn(y) e Argmin {Li{Bu) + \\\H^ {Ax - y)\\] , 

u 

where H G M"*^^^, || • || and a positive real A are parameters of the construction. 

Theorem 3.3 Given A, r.s. S, integer s, q G [l,oo] and e G (0,1), assume that {H, 
conditions Qs^g(«;) and Qs^i(x) with x < 1/2 and k > x. 

(i) Let A > 2s. Then for all x G M", y G M™ it holds for 1 < p < q: 



Lp{B[ 



kX 




<j(p- 


1) ^ 




p{<J- 


1) 




— X 















1 - 2x 

In particular, with X = 2s we have 



4:(2s)p g(p-i) 
LpiB[^p,^iy) - x]) < -^-^ [l + K- x]p('^-i) 
i — zx 



\\H^{Ax - y)\\ + -Li{Bx - [BxY) 
2s 



(3.14) 

satisfies the 

(3.15) 



\H^{Ax- 



+ Li{Bx-[Bx] 
2s 



,l<p<q. (3.16) 
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(ii) Let p > be such that the set H = : \\H'^{^ + ^)II ^ P^"" ^ ^} satisfies Prob{^ G H} > 1 — e. 



Then for all x S M", u and all ^ E one has for 1 < p < q: 



X = 2s^ Lp{B[xpeniAx + u + ^)-x]) < + [p + ^^Li{Bx - [Bx] 



Lp{B[xp,n{Ax + U + -x]) < ^[1 + ^ - [p + - [Sx] 



(3.17) 



Discussion. Let us compare the error bounds of the regular and the penahzed li recoveries associated 
with the same pair (H, \\ ■ ||) satisfying the condition Qs^g(«;) with n = 1/2. Let 



Pe[H, 



mm{p : Prob : 11-^^(14 + 011 < G > 1 - e} ; 



(3.18) 



this is nothing but the smallest p meeting the condition ( |3.11| ) with H satisfying Prob{^ G H} > 1 — e and 
thus - the smallest p for which the error bound (3.12) for the regular ii recovery holds true with probability 
1 — e (or at least the smallest p for which the latter claim is supported by Theorem 3.1). With p = Pe[Q, \\ • ||], 



the regular ii recovery guarantees (and that is the best guarantee one can extract from Theorem |3.1| ) that 

(!) For some set H, Prob{^ G H} > 1 — e, of "good" realizations of the random component ^ of 
the observation error, one has 



4(2s)p 

Lp{B[xiAx + u + 0-x])< 

J. AKj 

whenever x G M", u G Z^, ^ G H. 



[H,\\-\\] + ^^L,{Bx-[Bx] 



l<p< 



(3.19) 



The error bound (3.16) (where we can safely set >c = k, since Qs,q{K) implies Qs,i(k)) says that (!) holds 
true for the penalized ii recovery with A = 2s. The latter observation suggests that the penalized ii recovery 
associated with {H, \\ ■ ||) and A = 2s is better than its regular counterpart, the reason being twofold. First, 
in order to ensure (!) with the regular recovery, the "built in" parameter p of this recovery should be set 
to Pe[H, II • II], and the latter quantity not always is easy to identify. In contrast to this, the construction 
of penalized ii recovery is completely independent of a priori assumptions on the structure of observation 
errors, while automatically ensuring (!) for the error model we use. Second, and more importantly, for 
the penalized recovery the bound ( 3.19| ) is no more than the "worst, with confidence 1 — e, case," while the 



typical values of the quantity ||//^(n + 0|| which indeed participates in the error bound (3.15) are essentially 
smaller than Pe[H, \\ ■ \\]. Our numerical experience fully supports the above suggestion: the difference in 
observed performance of the two routines in question, although not dramatic, is definitely in favour of the 
penalized recovery. The only potential disadvantage of the latter routine is that the penalty parameter A 
should be tuned to the level s of sparsity we aim at, while the regular recovery is free of any guess of this 
type. Of course, the "tuning" is rather loose - all we need (and experiments show that we indeed need this) 
is the relation A > 2s, so that a rough upper bound on s will do; note, however, that the bound (3.15) 
deteriorates as A grows. 



4 Tractability of condition Qs oo(^)? ^oo-norm of the blocks 

We have seen in section [3] that given a sensing matrix A an an r.s. S = {B,ni, ...,nK, \\ • ||(i), || • \\(k)) 
such that the associated conditions Qs,g(«^) are satisfiable, we can validate ^i-recovery of nearly s-block- 
sparse signals, specifically, can point out £i-type recoveries with controlled (and small, provided so are the 
observation error and the deviation of the signal from an s-block-sparse one). The bad news here is that, in 
general, condition Qs^q(K), as well as other conditions for the validity of £i recovery, like Block RE or RIP, 
cannot be verified efficiently. The latter means that given a sensing matrix A and 5, it is difficult to verify 
that a given candidate pair (H, || • ||) satisfies the associated with A,S condition Qs^q(K). Fortunately, one 
can construct "tractable approximations" of condition Qs^g(K), i.e. verifiable sufficient conditions for the 
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validity of Qs^q(«;). The first good news is that when ah || • are the uniform norms || • ||oo and, in addition, 



q = cc (which, by Observation 2.1, corresponds to the strongest among the conditions Qs,q{K) and ensures 



the validity of (3.12), (3.15) in the largest possible range 1 < p < oo of values of p), the condition Qs,q{K) 



becomes "fully computationally tractable." We intend to demonstrate also that this condition Qs,oo{i^) is 



in fact necessary for the risk bounds of the form (3.12), (3.17) to be valid when p = oo. 



4.1 Condition Qs,oo(^): tractability and the optimal choice of the contrast H 

Notation. In the sequel, given r,6 G [1, oo] and a matrix M, we denote by ||M||r^0 the norm of the linear 
operator u i— >• Mu induced by the norms || • \\r and || • \\g on the origin and the destination spaces: 

||M||j._5i = max ||Mti||0. 

u:\\u\\r<l 

We denote by ||M||(^ fc) the norm of the linear mapping u i— t- Mu : M"* — >• M"*-' induced by the norms || • 
II • II (^.) on the argument and on the image spaces. Further, Rowfc[M] stands for the transpose of the fc-th 
row of M and Colfc[M] stands for fc-th column of M. Finally, ||u||s,q is the ig-norm of the vector obtained 
from a vector w G by zeroing all but the s largest entries in u. 

Main result. Consider r.s. Soo = {B,ni, ...,nK, \\ ■ \\oo, || ■ ||oo)- We claim that in this case the condition 
Qs,oo('^) becomes fully tractable. Specifically, we have the following 

Proposition 4.1 Let a matrix A G M*"^", the r.s. Soo, o- positive integer s and reals k > 0, e G (0, 1) he 
given. 

(i) Assume that a triple {H, \\ ■ \\,p), where H G M™^*^, || • || is a norm on M*^, and p>0, is such that 

(!) {H, II -ID satisfies Qs,oo('«); o,nd the set E = : \\H'^[u+^] \\ < p \/u £ U} satisfies P(H) > 1— e. 

Given H, ||-||,p, one can find efficiently N = ni + ... + nK vectors h^ in M™ and N x N block matrix 

V = \V^^]^^^i (the blocks V^^ ofV are x matrices) such that 

(a) B = VB + [h^,...,h^\^A, 

(b) ||F^1oo,oo < s-i^t yk,e<K, ^^_2o) 

(c) P (e+ := : max-u^/i* + |^^/i^| < p, 1 < i < N}] > 1 - e 

\ u£U J 

(note that the matrix norm ||A||oo,oo = maxj ||Rowj[^]||i is simply the maximal li-norm of the rows of A ) . 



(ii) Whenever vectors h^, h^ G and a matrix V = \V^^]^^^-^ with x blocks V^^ satisfy (4.20), 
the m X N matrix H = [h^ , h^], the norm || • ||oo on M.^ and p form a triple satisfying (!). 

Discussion. Let a sensing matrix A G R™-^" and a r.s. 5oo be given, along with a positive integer s, an 



uncertainty set U, a distribution P of and e G (0, 1). Theorems 3.1, 3.3 say that if a triple {H, \\ ■ \\,p) is 



such that {H, \\ ■ \\) satisfies Qs,oo('^) with k < 1/2 and H,p are such that for the set 

H = {e: \\H^[u + ^]\\ <pyueU} 

it holds -P(H) > 1 — e, then for the regular ii recovery associated with (H, \\ ■ \\,p) and for the penalized ii 
recovery associated with (H, \\ ■ \\) and A = 2s, the following holds: 

V(j;GM",nG^/,<eGH) : 

1 (4 21) 

Lp{B[x{Ax + u + C)-x])< [p + i-^Li{Bx - [Bx]')] , 1 < p < oo. 



Proposition 4.1 states that when applying this result, we lose nothing by restricting ourselves with triples 



H = [h^, h^] G M™^^, = ni + ...+nK, || • |l — Loo{-), p > which can be augmented by an appropriately 
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chosen matrix N x N matrix V to satisfy relations (4.20). In the rest of this discussion, it is assumed that 



we are speaking about triples {H, \\ ■ \\,p) satisfying the just defined restrictions. 



The bound (4.21) is completely determined by two parameters — k (which should be < 1/2) and p; the 
smaller are these parameters, the better are the bounds. In what follows we address the issue of efficient 
synthesis of matrices H with "as good as possible" values of k and p. 

Observe first that H = [h^, h^] and k should admit an extension by a matrix y to a solution of the 



system of convex constraints (4.20 a), (4.20 b). In the case of ^ = the best choice of p, given H, is 



p = maxpu{h^), where pu{h) = max-u h. 
i u&A 

Consequently, in this case the "achievable pairs" p, k form a computationally tractable convex set 



Gs =^{k,p): 3H = [h\...,h^] 



pmxN 



B = VB + H'^A, \\V 



<f, Pu{h')<pA<i<N 



When ^ does not vanish, the situation is complicated by the necessity to maintain the validity of the 
restriction 

■.= P{i: + l^fi'l <P,l<i<N}>l-e, (4.22) 

which is a chance constraint in variables , , p and as such can be "computationally intractable." 
Let us consider the "standard" case of Gaussian zero mean noise ^, that is, assume that ^ = Dtj with 
r] ~ AA(0, Im) and known D £ 



^mxm rpj^gj^ (|4 implies that 
p > max 



On the other hand, (4.22) is clearly implied by 

p > max pu{^^) + Erfinv 



\2N) " ' 



1< i < iV. 



Ignoring the "gap" between Erfinv (|) and Erfinv (5^), we can safely model the restriction (4.22) by the 
system of convex constraints 



pu{h') + Erfinv \\D'^h% <pA<i<N. 



(4.23) 



Thus, the set Gg of admissible k, p can be safely approximated by the computationally tractable convex set 

tT A \UiM\\ < «, i< < if ^ 
1 II n3 

y2N > 



H 

V = [v^^^ e 



h\....h^] e M"^^ 



, . . . , I 

\k.e=l 



B = BV + H^A, 1100,00 



maxu^/i' + Erfinv (^) \\D'^h'\\2 < p, l<i<N 



(4.24) 



4.2 Condition Qs,oo(/^): necessity 
In this section, as above, we assume that all norms 

addition, that ^ is a zero mean Gaussian noise: ^ = Dr] with r/ ~ AA(0, Im) and known D G 
the above discussion we know that if, for some k < 1/2 and p > 0, there exist H = [h^, /i^] G ]^rnxN g^j^^ 



l(fc) 



in the r.s. Snc are 



00-norms; we assume, m 



V = [V^^ G M"^*^"^*^]^^^^ satisfying (4.20), then regular and penalized ii recoveries with appropriate choice 
of parameters ensure that 



V(x eW,u£U): 

Proh{C:\\B[x-x{Ax + u + moo < [p + l^Li{Bx - [Bx]')]} > 1 - e. 



(4.25) 



We are about to demonstrate that this implication can be "nearly inverted:" 
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Proposition 4.2 Let a sensing matrix A, an r.s. Soo with \\ • ||(fc) = || • ||oo, 1 < ^ < K, an uncertainty set 
lA, and reals k > 0, e € (0, 1/2) he given. Suppose that the observation error "is present," specifically, that 
for every r > 0, the set {u + De : u £U, \\e\\2 < r} contains a neighborhood of the origin. 

Given a positive integer S, assume that there exists a recovering routine x satisfying an error bound of 



the form (4.25), specifically, the bound 



y{x £R'',u£K) : Froh{\\B[x - x{Ax + u + 0]\\oo < a + S'^LiiBx - [Bxf)} > 1 - e. 
for some a > 0. Then there exist H =[h^, h^] G R^x^ and V = [V^^ G M"'=^'"'^]f^^-^ satisfying 



(4.26) 



(a) B = VB + H^A, 

ioo,oo <25-i ykj<K, 

max n'^/i* + Erfinv 



(6) II y^'^ I 

(c) with p : = 



max 

l<i<N 



2N 



(4.27) 



one has p <2a when D = 0, p < '^(^-^^^^^ when D ^ 0, and for ^ = Dr], rj ~ A/'(0, Im) one has 



P E 



{e : maxu^ + \C h'\ < p, 1 < i < N} ] > 1 - e. 



In other words, (see Proposition 4.1), (i/, Loo(-)) satisfi es Qs,oo{k) for s "nearly as large as 5," namely, 
s < ^S, and H = [h^, ...,h^], p satisfy conditions (4.23) with p being "nearly a", namely, p < 2a in the 

case of D = and p < 2 ^Erfinv(£) ^ when D 0. In particular, under the premise of Proposition 4.2, the 
contrast optimization procedure of section 4.1 supplies the matrix H such that the corresponding regular or 



penalized recovery x{-) for all s < ^ satisfies: 



Prob<^ \\B\x 



< 4 



t -¥^a + s-^LUBx - [Bx] 

Erfinv(e) 



> 1 - e. 



5 Tractable approximations of Qs g(«:) 



Aside from the important case q = oo, 



\{k) 



considered in sections 



4.1 



and 



4.2 



condition Qs,q{n) 



"as it is" seems to be computationally intractable: unless s = 0(1), it is unknown how to check efficiently 
that a given pair {H, \\ ■ \\) satisfies this condition, not speaking about synthesis of a pair satisfying this 



condition and resulting in the best possible error bound 3.12), (3.15) for regular and penalized £i-recoveries 



We are about to present verifiable sufficient conditions for the validity of Qs,g(K) which may become and 
interesting substitution for condition Qs,g(K) for that purposes. 



5.1 Sufficient condition for Qs,q(K) 



Proposition 5.1 Suppose that a sensing matrix A, an r.s. {B,ni, ...,nK, \\ ■ ||(i), || 
are given. 

Let N = ni + ... + uk, and let N x N matrix V = [V^^]^^^-^^ (V^^ are x ni) and m x N matrix H 
satisfy the relation 



L 



s,q 



B = VB + A. 

Let us denote 

'^t ai^) = max max 
Then for all s < K and all q £ [1, oo], we have: 

Ls,q{Bx) < sh^iH^Ax) + i^*{V)Li{Bx) Wx. 



(5.28) 



([FiV;...;y^V]). 



(5.29) 
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The result of Proposition 5.1 is a step to verifiable sufficient condition for the validity of Qs,q- To get such 
condition we need an efficiently computable upper bound of the quantity i/*^. In particular, if for a given 
positive integer s < K and a real g G [1, oo] there exist an upper bounding function i's,qiy) such that 

Vs,q{-) is convex and Vs,q{V) > (5.30) 

and a matrix V such that 



l^s,q{V) < SI 



i-1 



(5.31) 



then the pair (iJ, Loo(-)) satisfies Qs^q(K). An important example of the upper bound for q{V) which 



satisfies (5.31) is provided in the following statement. 



Proposition 5.2 Let Vt be a K x K matrix with entries ['^]k,e = \\{e,k)> 1 < k,£ < K . Then 



Ds,q{V) := max \\Coh[n]\U,q > <,(y) 

l<k<K 



(5.32) 



(note that the inequality in (5.32) becomes equality when either 



oo, or s 



^ - — 1 

%,qiy) < SI K 

is sufficient for (ff, Loo(-)) to satisfy Qs^q{K). 

When all || • \\(^k) the ^oo-norms and q = oo, the results of Propositions 
In the general case, they suggest a way to synthesize matrices H G M"' 



1 ), so that the condition 

(5.33) 



4.1 



5.1 

xW 



and 



5.2 



recover Proposition 



norm 



which, taken along with the 
Loo(-)) provably satisfy the condition Qs^g(K), along with a certificate V for this fact. Namely, 



H and V should satisfy the system of linear equations (5.28) and, in addition, (5.31 ) should hold for V with 



i^s,q{') satisfying (5.30). Further, for such a iy's,q{-), (5.31) is a system of convex constraints on V. Whenever 



these constraints are efficiently computable, we get a computationally tractable sufficient condition on H 
to satisfy Qs^g(K) - a condition which is expressed by an explicit system of efficiently computable convex 



constraints (5.28), (5.31) on H and additional matrix variable V. 



5.2 Tractable sufficient conditions and contrast optimization 

The quantity i's,qi') is the simplest choice of i^.. 



ys,qi-) satisfying (5.30). In this case, efficient computability of 
the constraints (5.31) is the same as efficient computability of norms || • ||(fc^£). Assuming that || • = || • \\r^. 

The 



for every k, the computability issue becomes the one of efficient computation of the norms || • Hr^.r^.- 
norm || • \\rfi is known to be generically efficiently computable in only three cases: 

1. ^ = oo, where ||M||r.,oo = ||M'^||i^^ = max ||Rowf (M)||^; 

2. r = 1, where ||M||ie = max ||Colj[M]||e; 

j 

3. r = 9 = 2, where ||M||2,2 = (^maxiM) is the spectral norm of M. 

Assuming for the sake of simplicity that in our r.s. || • are r-norms with common value of r, let us 
look at three "tractable cases" as specified by the above discussion - those of r = oo, r = 1 and r = 2. In 
these cases, candidate contrast matrices H are mx N, the associated norm || • || is Loo{-), and our sufficient 
condition for H to be good (i.e., for {H,Loo{-)) to satisfy Cls,q{n) with given k < 1/2 and q) becomes a 
system S = S^^g of explicit efficiently computable convex constraints on H and additional matrix variable 
V S M^^^, implying that the set H of good H is convex and computationally tractable, so that we can 
minimize efficiently over T-L any convex and efficiently computable function. In our context, a natural way 



to use S is to optimize over H G % the error bound (3.19), or, which is the same, to minimize over H the 
function p{H) = p^[H, Loo{-)], see (3.18), where e < 1 is a given tolerance. Taken literally, this problem 
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still can be difficult, since the function p{H) is not necessarily convex, and can be difficult to compute even 
when convex. To overcome this difficulty, we again can use a verifiable sufficient condition for the relation 
p{H) < p, that is, a system T = of explicit efficiently computable convex constraints on variables H and p 
(and, perhaps some slack variables such that p{H) < p for the {H, /9)-component of every feasible solution 
of T. With this approach, design of the best, as allowed by S and T, contrast matrix H reduces to solving 
a convex optimization problem with efficiently computable constraints in variables H, V, p, specifically, the 
problem 

min^ {p: H,V satisfy S; H, p, C satisfy T} . (5.34) 

In the rest of this section, we present explicitly the systems S and R for the three tractable cases we are 
interested in, assuming the following model of observation errors: 

U = {u = Ev: \\v\\2 < 1}; e = D^, V ~ AA(0, /„), 

where E,D e M™^™. 

We use the following notation: the m x n matrix H is partitioned into m x blocks H[k], 1 < k < K , 
according to the block structure of the representation vectors; t-th column in H[k] is denoted h^'' G M™, 
1 <t<nk. 



For derivations of the results to follow, see section A. 7 



The case of r = oo. The case of g = c« was considered in full details in section 4.1 When q < oo, one 
has: 



^K,q ■ 



B = VB + H^A, VlM.= \\V^%oo 
||Col4Jl]||s,g < s^~V, l<i<K 



maxi<(<nt llRowjy^*]!!!, l<k,i<K 



R, : Erfinv(27v)P^/i^i2 + \\E^h''^\\2 < p, I < t < nk,l < k < K. 



(5.35) 



The case of r = 2. Here 



R. 



B = VB + H'A, nke--= \\V'"'\\2,2 = cr^UVn, l<k,e<K 



[ \\C0ll[n]\\s,g < S9"V, l<i<K 

iETH[k])+ak<p 
Wk D^H[k] " 
H^[k]D akin, 

TtiWk) + 2 [Sl3k + ^6^ [31 + 25-fl 
6:=ln{K/e), 



hO, \\XiWk)\\oo<l3 + k, \\XiWk)h<lk \,l<k<K 



(5.36) 



where §™ is the space of m x m symmetric matrices, X{W) is the vector of eigenvalues of G §" 



The case of r = 1. Here 

f B = VB + H^A, Qki 



R, : 



y'^ii,! = maxi<t<„, II Colt [y^^] 111, l<k,i< K, 



[ ||Col£P||,,g < S^^^K, l<i<K 
3{A^' GM™,/ > 0}f^i : 



Erfinv 



k 



Diag{A'=} H^[K]E 

E^H[k] pHn, 



>- 



'i{k <K,t< Uk). 



(5.37) 



13 



5.3 Tractable sufficient conditions: limits of performance 

Consider the situation where all the norms || • are || • \\r, with r £ {l,2,oo}. A natural question about 
verifiable sufficient conditions for a pair (H, Loo(-)) to satisfy Qs,g(«^) is, what are the "limits of performance" 
of these sufficient conditions. Specifically, how large could be the range of s for which the condition can be 
satisfied by at least one contrast matrix. Here is a partial answer to this question: 

Proposition 5.3 Let A be an m x n sensing matrix which is "essentially non-square, ", specifically, such 
that 2m <n, let B = I and let = d, || • = || • \\r, 1 <k < K, with r G {l,2,oo}. Whenever anmx N 
matrix H and N x N matrix V satisfy the conditions 

I = V + H^A and ^max^ || IIF^'^II^,^; < is^~^ (5.38) 



(cf. (5.28), (5.32), and {5.31)), one has 



3\/m 

s < 5.39 

~ 2Vd 



Discussion. Let the r.s. in question be the same as in Proposition |5.3[ and let m x n sensing matrix A 
have 2m < n. Proposition |5.3| says that in this case, verifiable sufficient condition, stated by Proposition 



5.1 , for satisfiability of Qs,g(K) with k < 1/2 has rather restricted scope — it cannot certify the satisfiability 
of Qs^q(K), K < 1/2, when s > Yet, the condition Qs_q(K) may be satisfiable in a much larger range 

of values of s. For instance, when the r.s. in question is the standard one, and A is a random Gaussian 
mxn matrix, the matrix A satisfies, with overwhelming probability as m,n grow, the RIP(^,,s) condition 



for s as large as 0(l)m/ y^ln(n/m) (cf. |8]). By Proposition 2.1, this implies that (|A, || • ||oo) satisfies the 
condition Qs,2(|) in the essentially the same large range of s. There is, however, an important case where 
the "limits of performance" of our veriBahle sufficient condition for the satisfiability of Q<j,g(K) implies severe 
restrictions on the range of values of s in which the "true" condition Qs^q{n) is satisfiable - this is the case 
when q = oo and r = oo. Combining Propositions |4.l] and [Ksj we conclude that in the case of r.s. from 



Proposition 5.3 with r = oo and "sufficiently non-square" (2m < n) mxn sensing matrix A, the associated 
condition Qs,oo{l) is cannot he satisfied when s > 

5.4 Tractable sufficient conditions and Mutual Block-Incoherence 

We have mentioned in Introduction that, to the best of our knowledge, the only previously proposed verifiable 
sufficient condition for the validity of block-£i recovery is the "mutual block incoherence condition" |15j . 



Our immediate goal is to show that this condition is covered by Proposition 5.1 

Consider an r.s. with B = I and with ^2-iiorms in the role of || • 1 < A; < K, and let the sensing 
matrix A in question be partitioned as ^ = [^[1], ...,^[-ftr]], where A[k] has columns. Let us define the 
mutual block-incoherence oi A w.r.t. the r.s. in question as follows: 

fi = ^ max^, a^ax {C^'A^[k]A[e]) , [Ck := A'^[k]A[k]] (5.40) 

provided that all matrices Ck, 1 ^ k < K , are nonsingular, otherwise fi = oo. Note that in the case of 
the standard r.s., the just defined quantity is nothing but the standard mutual incoherence known from the 
Compressed Sensing literature (see, e.g., [13]). 

In [I5J, the authors consider the same r.s. and assume that = d, 1 < k < K , and that the columns 
of A are of unit || • ||2-norm. They introduce the quantities 

1^ = max max |Coin^[fcllCoL/U[A:ll|, Ub = \ max a^^JA^\k]A\P\) (5.41) 
l<k<K l<jjtj'<K J ^ <■ " J L L jji r ^ i<k,e<K, 



14 



and prove that an appropriate version of block-^i recovery allows to recover exactly every s-block-sparse 
signal X from the noiseless observations y = Ax, provided that 



1 - (d - 1)1/ > and s < X 
The following observation is almost immediate: 



1 — {d — l)v + d^iB 



2dn 



B 



Proposition 5.4 Given m x n sensing matrix A and an r.s. S with B = I, \\ ■ 
A = [A[l], A[K]] be the corresponding partition of A. 

(i) Let be the mutual block-incoherence of A w.r.t. S. Assuming fi < 00, we set 
H=-^[A[1]C^\A[2]C^\...,A[K]C],\ Ck = A^[k]A[k]. 



(5.42) 



\2, l<k<K, let 



(5.43) 



Then the contrast matrix H along with the matrix V = I 

us 

K = . 

1 + /U 



H^A satisfies condition (5.28) (where B = I) 



and condition (5.33) with q = 00 and 



As a result, applying Proposition 5.1, we conclude that whenever 

1 + /i 



s < 



2/x ' 



(5.44) 



1^ < 1/2. 



the pair {H,Lac{-)) satisfies Qs,oo('«) with k 

(ii) Su ppose that = d, k = 1, K, and let the quantities v and defined in (5.41 ) satisfy the relations 
(5.42). Then the mutual block-incoherence of A w.r.t. the r.s. in question does not exceed p, = jzr^^^- 

Further, we have ^^j^ = Xj '^'^^ (5-44^ holds, and thus ensures that the contrast H, as defined in {5.43), 
and Loo{-) satisfy Qs,oo(k) with some k < \. 

Let A = {Aij\ G ]^'^x" be random matrix with i.i.d. entries Aij ~ cN{Q,m~^). We have the following 
simple result. 



Proposition 5.5 Assume that B = I , Uk = d and 



1 2 for all k. There are absolute constants 



Ci) C2 < 00 (the corresponding bounds are provided in the appendix A. 10) such that if m > Ci(d + ln(n)), 
then the mutual block-incoherence n of A satisfies with probability at least 1 — ^ 



d + ln(n) 



m 



(5.45) 



The bound (5.45), along with Proposition 5.4 (i), implies that when A is a Gaussian matrix, all block-norms 
are the ^2-iiorms and all = d with d "large enough" (such that d~^lnn = 0(1)), the verifiable sufficient 
condition for Qs,oo(|) holds with overwhelming probability for s = 0{^J^). In other words, in this case 
the (verifiable!) condition Qs,oo{i^) attains (up to an absolute factor) the limit of performance stated in 
Proposition |5.3[ 



6 Matching pursuit algorithm for block recovery 



The Matching Pursuit algorithm for block-sparse recovery is motivated by the desire to provide a reduced 
complexity alternative to the algorithms using £1 -minimization. Several implementations of Matching Pur- 
suit for block-sparse recovery have been proposed in the Compressed Sensing literature [3tHlll5|ll6j. In this 



section we aim to show that a pair H, V satisfying (5.28) and (5.31 ) where k < 1/2 (and thus, by Proposition 
5.1 , such that {H, Loo(-)) satisfies Qs,oo('t)) can be used to design a specific version of the Matching Pursuit 
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algorithm which we refer to as Non-Euclidean Block Matching Pursuit (NEBMP) algorithm for bfock-sparse 
recovery. 

We fix an r.s. S = {B, ni, n^, \\ • ||(i), || • and assume that the block norms || • ||(fc)5 ^ = •■•) K, 
are either || • ||oo- or || • ||2-norms. Furthermore, we suppose that the matrix B is of full row rank, so that given 
z S one can compute x such that z = Bx (e.g., x = B~^z where B~^ = B^{BB^)^^ is the pseudo-inverse 
of B). Let the noise ^ in the observation y = Ax + u + be Gaussian, ^ ~ AA(0, D), D £ ]g»"x™- known. 
Finally, we assume that we are in the situation of section [5]2j that is, we have at our disposal an m x A^, 
N = ni + ...+nK, matrix H,an N xN block matrix V = [V'^^ £ M"'=''"'^]|^^=p a 7 > and /) > such that 

(a) B = VB + H^A, 

(b) ll^'l(^,fc)<7 yk,e<K (6.46) 

(c) Prob{H+:=U: LooiH^[u + ^]) < p^u £ U}} > I - e. 

Given observation y, a positive integer s and a real > (i; is our guess for an upper bound on 
Li{Bx — [BxY)), consider Algorithm 1 below. Its convergence analysis is based upon the following: 

Algorithm 1 Non-Euclidean Block Matching Pursuit 



1. Initialization: Set v^'^^ = 0, oq 



1— 57 

2. Step k, k = 1,2,...: Given ■yC'-i) £ R" and afc_i > 0, compute 

(a) g = H^{y - Av^'"'^'^) and vector A = [A[l], A[A']] £ by setting for j = 1, AT: 

Ab1 = ^[||5b1l|2-7«.-i-K^)]+, if||-|l0-) = ll-lb; (g_47) 
Aji = sign{gji)[\gji\ - -fUk-i - iy{H)] + , 1 <i < nj, if || • = || • ||oo, 

where wji is i-th entry in j-th block of a representation vector w. 

(b) Choose f ^'^^ such that B{v^ — = A, set 

ak = 2s^ak-i + 2sp + v. (6.48) 

and loop to step k + 1. 

3. Output: the approximate solution found after k iterations is w^'^). 



Lemma 6.1 In the situation of (6.46), let sj < 1. Then whenever ^ £ for every x £ M" with 
Li{Bx — [BxY) < V and every u £14, the following holds true. 

When applying Algorithm 1 to y = Ax+u+^, the resulting approximations Bv^^^ to Bx and the quantities 

for all k satisfy the relations 

{at) foraUl<j<K ||(W=) - i?x)[j]||(,) < ||(fe)[j]||(,), 
(bk) Li{Bx - SuW) < Ok and L^{Bx - Bv'^^^^^) < 2^ak + 2p. 

Note that if 2s7 < 1, then also 57 < 1, so that Lemma |6.1| is applicable. Furthermore, in this case, by 

(6.48), the sequence Qfc converges exponentially fast to the limit aoo := 11^5^ • 

LiiBv^''^ - Bx) < at = {2s^)\aQ - a^o] + 
Along with the second inequality of (6fc) this implies the bounds: 



Loo{Bv^^^ - Bx) < 27afc_i + 2p< 



Ok 
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p-1 



and since Lp{w) < Li{w)p Loo^w) p for 1 < p < oo, we have 



Lf>{Bv^''^ - Bx) < s'-ir (2s7)^'[ao - aoo] + 



The bottom hne here is as follows. 



Proposition 6.1 Suppose that a collection {H, Loo{-), p,j,e) satisfies (6.46), and let the parameter s of 
Algorithm 1 satisfy k := 2s7 < 1. Then for all ^ G u € U , x G M" such that Li{Bx — [BxY) < v, 
Algorithm 1 as applied to y = Ax + u + ^ ensures that for every t = 1,2, . . . one has 



Lp{Bv^*'> - Bx) < sp 



2p + s^^v 
1 - 2k 



(2k) 



s-'^{Ls,i{H'^y)+v)+ p 2p + s-^v' 



1 - K 



1 - 2k 



I < p < oo 



(of. (4.21)). 



Note that Proposition 6.1 combined with Proposition 5.4 essentially covers the results of ^15] on the properties 
of the Matching Pursuit algorithm for block-sparse recovery proposed in this reference. 



7 Numerical Illustration 

In the theoretical part of this paper we were looking at the situation where the sensing matrix A and the 
r.s. {B, ni, ...,nK, \\ • ||(i), || • ||(fc)) were given, and we were interested to understand 

[A] whether ii recovery allows to recover the representations Bx of all s-block-sparse signals with a given 

s in the absence of observation noise, and 

[B] how to choose the best (resulting in the smallest possible error bounds) pair (H, \\ ■ ||)j^ 

Note that different components of our setup have in fact different status. While in typical applications A, 
B and the block structure n\,...,nK of the representation vectors may be thought as conditioned by the 
"problem's physics", it is not the case for the block norms || • \\(^k)- Their choice (which does affect the 
recovery routines) appears to be unrelated to the model of the data. 



The first goal of our experiments is to understand how to choose the block norms in order to validate 
li recovery for the largest possible value of the sparsity parameter s. The meaning we put into the word 
"validate" is providing guarantees of small error of recovery of all s-block-sparse signals when the observation 
error is small (what implies, of course, the exactness of the recovery in the case of noiseless observation). 
Here we restrict ourselves to the case when all block norms are norms with common value of r taken from 
the range {l,2,oo}. By reasons explained in the discussion in section [sj we consider here only the case of 
the penalized £i -recovery with mx N contrast matrix H (where, as always, N = ni + ... -|-n/^), || • || = -Loo(-) 
and with A = 2s (see ( |3.14| )). Beside this, we assume, mainly for the sake of notational convenience, that 



B = 1. 

Let us fix ^ G M™^",i? = 1^, K,ni, ...,nK (ni + ... + = n =: N). By Proposition 5.1 for every 
matrix H G M'"^" setting 

V = [V^^ e W"'''''']^^^^ = I - A, Q'^iH) = [\\V^^^^ 1^ 



r,r ik/=l^ 

I 

i<e<K " ' .J..",-' —V . i<ki<K"' 



kY{H)= max \\Cok[n'{H)]\\s,i, n'^'iM) = s max [n^iH)]k,i, (^-^9) 



^Needless to say, the results presented so far do not pretend to provide full answers to these questions. Our verifiable 
sufficient conditions for the "validity" of block-^i recovery supply only some lower bounds on the largest s = s* for which the 
answer to [A] is positive. Similarly, aside of the case q — oo, || • ||(fe) — \\ ■ ||oo, 1 ^ k < K , our conditions for the validity of 
block-^i recovery are only sufficient, meaning that optimizing the error bound over [H, \\ ■ ||) allowed by these conditions may 
only yield suboptimal recovery routines. 



4 



these are exactly the pairs (H, \\ ■ ||) covered by the sufficient conditions for the validity of £i-recovery, see Proposition 5.1 
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the pair (iJ, Loo(-)) satisfies the conditions Qs^q{Kq'^ (H)), q = 1 and q = oo, provided that the block norms 
are the ^r-ones. In particular, when < 1/2, the penalized recovery (i.e. the recovery (3.10) with 

all block norms being the ^^--ones) "is valid" on s-block-sparse signals, meaning exactly that this recovery 
ensures the validity of the error bounds (3.17) with q = oo, x = k['^ , k = (and, in particular, recovers 
exactly all s-block-sparse signals when there is no observation noise). 



Our strategy is as follows. For each value of r G {1, 2, oo}, we consider the convex optimization problem 



min \kY{H) :=max||Col,[J7^(i?)]|ki 

find the largest s = s{r) for which the optimal value in this problem is < 1/2, and denote by r E 

{1,2,00} the corresponding optimal solution. To these "marked" contrast matrices we add two more con- 
trasts, H^^^^ and H^^^^\ based on the mutual block-incoherence condition and given by the calculation 



(5.40) for the cases of the "standard" (1-element blocks in x = Bx) and the actual block structures, respec- 
tively. 

Now, given the set n = {H(^^\h(^^^\hW,H(^\h(°^^ of m X n candidate contrast matrices, we 
can choose the "most powerful" penalized ii/ir recovery suggested by Ti as follows: for every H G 7i 
and for every p G {1,2, 00}, we find the largest s = s{H,p) for which k^^{H) < 1/2, and then define the 
quantity s* = 8^^(71) = niax{s{H,p) : H £ T-l, p £ {1,2, 00}} along with £1-1 and G {1,2, 00} such 
that s^, = s{H^:,p^). The penalized ii/ip, recovery utilizing the contrast matrix and the norm 7vqq(*) 
associated with block norms || • ||p, of the blocks is definitely valid for s = s*(^), and this is the largest 
sparsity range, as certified by our sufficient conditions for the validity of £i/ir recovery, which we can get 
with contrast matrices from Ti. Note that > max[s(l), s(2), s(oo)], that is, the resulting range of values 
of s is also the largest we can certify using our sufficient conditions, with no restriction on the contrast 
matrices. 



Implementation. We have implemented the outlined strategy with the setup as follows. 

• the sizes of the sensing matrices A were (m = 96) x (n = 128), with B = I and i^T = 32 four-element 
blocks in Bx = x; 

• the 96 X 128 sensing matrices A were built as follows: we first draw a matrix at random from one of 
the following distributions: 

— type H: randomly selected 96x128 submatrix of the 128 x 128 Hadamard matrisj^ 

— type G: 96 x 128 matrix with independent M{0, 1) entries, 

— type R: 96 x 128 matrix with independent entries taking values ±1 with equal probabilities, 

— type T: random 96 x 128 matrix of the structure arising in Multi-Task Learning (see, e.g., p] and 
references therein): the consecutive 4-column parts of the matrix are block-diagonal with four 
24 X 1 diagonal blocks with independent A^(0, 1) entries, 

and then scale the columns of the selected matrix to have their || • ||2-norms equal to 1; 



The results we report describe 4 experiments differing from each other by the type of the (randomly 
selected) matrix A. 

^The Hadamard matrices i/fc of order 2*° x 2*^, fc = 0, 1, are given by the recurrence Ho = 1, -fffc+i = [Hk, Hk\ Hk, — -fffc]. 
They are symmetric matrices with ±1 entries and rows orthogonal to each other. 

''As far as our experience shows, the results remain nearly the same across instances of A drawn from the same distribution, 
so that only one experiment for each type of distributions in question appears to be representative enough. 
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r 












s(r) 


H 


1 


2 


0.4727 


0.509 


2 


0.444 


0.460 


3 


0.429 


0.429 


2 


0.487 


0.519 


3 


0.429 


0.429 


4 


2 


2 


0.436 


0.436 


2 


0.429 


0.429 


3 


0.429 


0.429 


3 


0.429 


0.429 


3 


0.429 


0.429 


3 


oo 


2 


0.473 


0.509 


2 


0.444 


0.460 


3 


0429 


0.429 


2 


0.487 


0.519 


3 


0.429 


0.429 


Q 

o 


G 


1 





0.000 


0.000 





0.000 


0.000 


3 


0.467 


0.900 


1 


0.301 


0.301 


1 


0.489 


0.489 


5 


2 





0.000 


0.000 


1 


0.368 


0.368 


1 


0.300 


0.300 


3 


0.447 


0.458 


2 


0.479 


0.549 


5 


CO 





0.000 


0.000 





0.000 


0.000 





0.000 


0.000 


1 


0.305 


0.305 


3 


0.483 


0.823 


4 


R 


1 





0.0000 


0.000 





0.000 


0.000 


3 


0.477 


0.853 


1 


0.291 


0.291 


1 


0.498 


0.498 


5 


2 





0.000 


0.000 


1 


0.354 


0.354 


1 


0.284 


0.284 


3 


0.438 


0.440 


1 


0.264 


0.264 


5 


oo 





0.000 


0.000 





0.000 


0.000 


1 


0.482 


0.482 


1 


0.286 


0.286 


3 


0.489 


0.739 


5 


T 


1 


1 


0.384 


0.384 


1 


0.399 


0.399 


2 


0.383 


0.383 


2 


0.383 


0.383 


2 


0.383 


0.383 


3 


2 


1 


0.384 


0.384 


1 


0.399 


0.399 


2 


0.383 


0.383 


2 


0.383 


0.383 


2 


0.383 


0.383 


3 


oo 


1 


0.384 


0.384 


1 


0.399 


0.399 


2 


0.383 


0.383 


2 


0.383 


0.383 


2 


0.383 


0.383 


3 



Table 1: Certified sparsity levels for penalized /^^-recoveries for candidate contrast matrices. For each 
candidate and each value of r we present in the corresponding cells the triple s{H, r)\K!^^^^ '^\h)\k^'^^ '^\h) . 
s{r): a computed upper bound on r-goodness s*{A^r) of A. Underlined in red: the best sparsity s*{7i) 
certified by our sufficient conditions for the validity of penalized recovery. 

In Table [l| we display the certified sparsity levels of penalized £i jir recoveries for the candidate contrast 
matrices. In addition, we present valid upper bounds s{r) on the "r-goodness" s*{A,r) of A, defined as 
the largest s such that the /^r-recovery in the noiseless case recovers exactly the representations of all 
s-block-sparse vectors, that is, 

s*{A, r) = max < s : x = argmin < > || [.z]fc||r : Az = Ax > for all s-block-sparse x. > 

I -eM" [fri J J 

We present on Figure [T] examples of "bad" signals (i.e., (s(r) + l)-block-sparse signals which are not recovered 
correctly by the latter procedure)!^ 

On the basis of this experiment we can make two tentative conclusions: 

• the ii/i2 recovery with the contrast matrix H^'^^ and the ii/£oo recovery with the contrast matrix H^°°^ 
were able to certify the best levels of allowed sparsity (when compared to other candidate matrices 
from v.); 



in our experiments, the upper bounds s(r) on the r-goodness s* { A, r) of A are close to the corresponding 
certified lower bounds s^:(J-l,r) = maxneH s{H,r). 



Numerical evaluation of recovery errors. The objective of the next experiment is to evaluate the 
accuracy of penalized £i/ir recoveries in the noisy setting. We consider the contrast matrices from T-L = 
{jj(Mi)^ j^(MBi)^ j^(2)^ jj(oo)| ^YiOYe. Note that it is possible to improve the error bound by optimizing 



it over H as it was done in section 5.2 In the experiments to be reported this additional optimization, 
however, did not yield a significant improvement (which perhaps reflects the "nice conditioning" of the 
sensing matrices we dealt with), and we do not present the simulation results for optimized contrasts here. 

• We ran four series of simulations corresponding to the four instances of the sensing matrix A we used. 
The series associated with a particular A was as follows: 



^It is immediately seen that whenever B is of full row rank, the nuUspace property "Ls,i{Bx) < ^Li{Bx) for all x £ KeiA 
with Bx 7^ 0" is necessary for s to be < s*{A,-). As a result, for B's of full row rank, s*{A,r) can be upper-bounded in a 
manner completely similar to the case of the standard r.s., see [231 section 4.1] 
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r = 1, s(l) = 4 



r = 2, s(2) = 3 



r = oo, s(oo) = 3 



Figure 1: "Bad" (s(r) + l)-block-sparse signals (blue) and their £i/ir recoveries (red) from noiseless obser- 
vations, H-matrix A. 

• Given A, we associate with it the five aforementioned candidate contrast matrices from T-L. Combining 
these matrices with 3 values of r (r = l,2,oo), we get 15 recovery routines. We augmented these 15 
routines by the block Lasso recovery as described in [26|. In our notation, this recovery is (cf. |26l 
(2.2)]) 

SLasso(y) G Argmin \ -\\Az - yg + 2^2 ^k\\z[k]\\2 \ (7.50) 

{z[k], 1 < k < K, are the blocks in z = Bz x), with the penalty coefficients chosen according to 
the equality version of the relations in \26\ Theorem 3.1] used with q = 2. 

Each of the 16 resulting recovery routines was on two samples, each of size 100, of randomly generated 
recovery problems as follows. In each problem the true signal was randomly generated with s nonzero 
blocks, and the observations were affected by pure Gaussian white noise: y = Ax + a^, ^ ~ Af{0,I). In 
the first sample, s was set to the best value s^:{7i) of block sparsity we were able to certify; in the second, 
s = 2s^:{T-L) was used. The parameter A in the penalized recoveries was set to 2s (and thus was tuned to the 
actual sparsity of test signals). In both samples, we used a = 0.001. 

We compare the recovery routines on the basis of their ratings computed as follows: given a recovery 
problem from the sample, we applied to it every one of our 16 recovery routines and measured the 16 
resulting || • ||oo -errors. Dividing the smallest of these errors by the error of a given routine we obtain "the 
rating" of the routine in this particular simulation. Thus, all ratings are < 1; the routine which attains the 
best II • I loo-recovery error for the current data is rated "1.0". For the remaining routines, the closer to 1 is 
the rating of the routine, the closer is the routine to the "winner" of the current simulation. The final rating 
of a given recovery routine is its average rating over all 800 = 4 x 2 x 100 recovery problems processed in 
the experiment. 

The resulting ratings are presented in Table [2j The "winner" is the routine associated with r = 2 and 
H = H^'^\ Surprisingly, the second best routine is associated with the same r = 2 and the simplest contrast 
iif , an outsider in terms of the data presented in Table [l] This inconsistency may be explain by the fact 
that the data in Table [T] describe the guaranteed worst-case behavior of our recovery routines, which may 
be quite different from their "average behavior" , reflected by Table [2j Our tentative conclusion on the basis 
of the data from Tables [l] [2] is that the penalized ii/i2 recovery associated with the contrast matrix if^^^ 
may be favored when recovery guarantees are to be associated with good numerical performance. 

The above comparison was carried out for a set to 0.001. The conducted experiments show that for the 
routines in question and our purely Gaussian model of observation errors, the recovery errors are, typically, 
proportional to a. This is illustrated by the plots on Figure [2] on which we traced the average (over 40 
experiments for every grid value of a) signal-to-noise ratio (the ratio of the || • ||oo-error of the recovery to a) 
of our favorable recovery (r = 2, H = H^'^'^) and the corresponding performance figure for block Lasso. 
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r 












Lasso 


1 


0.30 


0.20 


0.53 


0.60 


0.54 


N/A 


2 


0.76 


0.51 


0.75 


0.79 


0.75 


0.19 


oo 


0.25 


0.18 


0.44 


0.48 


0.44 


N/A 



Table 2: Ratings of recovery routines. 
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Figure 2: Average over 40 experiments ratio of || • ||oo-recovery error to a vs. a. In blue: ii/£2 recovery, 
H = H^"^^; in red: Lasso recovery. 
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A Proofs 



A.l Proof of Proposition 2.1 



Let X G W^, and let x^, ...,x'^ be obtained from x by the following construction: x^ is obtained from x by 
zeroing all but the s largest in magnitude blocks; is obtained by the same procedure applied to x — x^, x^ 
- by the same procedure applied to x — x^ — x'^, and so on; the process is terminated at the first step q when 
it happens that x = x^ + ... + x'^. Note for j > 2 we have L^{x^) < s~^ Li{x^~^) and Li{x^) < Li{x^~^), 
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whence also 11x^2 = L2{x^) < ^/ Loo{x^)Li{xi) < s-^/'^Li{x^-^). Recah that if A is BRIP((5,2s), then for 
every two s-block-sparse vectors u, v with non-overlapping supports we have 

Im^A'^^uI < 5||li||2||l'||2- (*) 

(i) : We have 

Px^aPxIb > [xYA'^^x = \\Ax^\l - Ei=2[^^]^^^^^^' 
>\\Axm-6EU\\'^%\\x^h [by(*)] 

> \\Ax% - 6s-y''\\x%Z]^^Li{x^-') > \\AxY2 - Ss-^^VhLi{x) 
\\AxY2 < WAx^hWAxh + 5s-^/^x%Li{x) 

Ls,2(.x) = \\x%<^JAx\\2+^-^Li{x) [byBRIP(<5,2s)] 

and we see that the pair (^H = ^^^^Im, \\ • ||2^ satisfies Qs,2{izis)^ claimed in (i). 

(ii) : We have 

Li{x^)L^{A'^Ax) > [xYA'^Ax = \\Ax^\\l - Y1%2[^Y A^ Ax^ 

> \\Ax^\\\ — 5s''^/'^\\x^\\2Li{x) [exactly as above] 
^ Px^lli < Li{x^)L^{A^ Ax) + 5s~y^x^\\2Li{x) 

{I - 5)\\x^\l < Li{x^)L^iA'^Ax) + 5s-^/^x^\\2Li{x) [by BRIP(5, 2s)] 
< s^/^x'hL^iA^Ax) + 5s-^/^x%Li{x) 

=> L,,2(X) = 11X^2 < f^Loo(A^Ax) + ^s-1/2Li(x), 

and we see that the pair (H = jhsA, ^oo(-) ) satisfies the condition Cls,2 ( jz^ ) , as required in (ii). | 



A. 2 Proof of Theorems |33] and [3:2] 



All we need is to prove Theorem 3.2 since Theorem 3.1 is the particular case x = k < 1/2 of Theorem 3.2 
Let us fix X G M", u & U and S H, and let us set = « + cj^, x = x^egiAx + rj). Let also / G {1, K} 
be the set of indexes of the s largest in magnitude blocks in Bx, J be the complement of / in {1, ...,K}, 
and let for w G W, wj and wj be the vectors obtained from w by zeroing blocks w[k] with indices k ^ I 
and k ^ J, respectively, and keeping the remaining blocks intact. Finally, let z = x — x. 
1^. By the definition of S and due to S H, m e Z^, we have 



\H'^{[Ax + r]] -Ax)\\ < p, 



(A.51) 



so that X is a feasible solution to the optimization problem specifying x, whence Li[Bx) < Li(i?x). We 
therefore have 

Li([Sx]j) = L^{Bx) - Li{[Bx]i) < Li{Bx) - Li{[Bx]i) 

= Li{[Bx\i) + Li{[Bx\j) - Li{[Bx\i) < Li{[Bz\i) + Li{[Bx]j), 



and therefore 



It follows that 



Li({Bz]j) < Li{[Bx]j) + Li{[Bx]j) < Li{[Bz]i) + 2Li([5x] 



Li{Bz) = Lii[Bz]i) + Li{[Bz]j) < 2Li{[Bz]i) + 2Li{[Bx]j). (A.52) 
Further, by definition of x we have ||-fr^([Ax + n + ,^] — j4x)|| < p, which combines with (A.51) to imply that 

||if^A(x-x)|| < 2p. (A.53) 
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2^. Since {H, \\ • \\) satisfies Qs,i(x), we have 



Ls,i{Bz) < s\\H'^Az\\ + xLi{Bz). 



By (A.53), it follows that < 2s/9+xLi(i3z), which combines with the evident inequality Li([i?z]/) < 

Ls^i{Bz) and with (A. 52) to imply that 



whence 



Li{[Bz\i) < 2sp + xLi{Bz) < 2sp + 2xLi{[Bz]i) + 2xLi{[Bx]j), 

2sp + 2xLi{[Bx]j) 



Li{[Bz]j) < 



Invoking (A.52), we conclude that 



Li{Bz) < 



As 



1 - 2x 



1 - 2x 



p+-L,{[Bx]j) 



{AM) 



3^. Since {H, \\ • ||) satisfies Qs,g(K), we have 



Ls,q{Bz) < S^^\\H'^Az\\+ KS^'l ^Li{Bz) 



which combines with (A. 54) and ( A.53| ) to imply that 

L,,,{Bz) < shp + .s'^'-B±^^:f^^^ < i%t^ [p + i,L,i[Bx]j)] (A.55) 

(we have taken into account that x < 1/2 and k > x). Let 9 be the (s + l)-st largest magnitude of the 
blocks of Bz, and let w = Bz — [Bz]". Now (A.55) implies that 



. X 4 1 + K - X 

< Lsn{Bz)s 1 < ^ 



p + -Lr{[Bx]j] 



Hence we have 



q-l 1 q-1 1 9-1 f4s) 



1 - 2x 



p+-Li{[Bx]j) 



< 



1 azii 
4s9 [1 + K — x] 1 

1 - 2x 



p+-Li{[Bx]j) 



Taking into account (A.55) and the fact that the supports of [BzY and w do not intersect, we get 
Lq{Bz) < 25 max[Lq{[BzY), Lq{w)] = 2^ max[Ls,q{Bz), Lq{w)] 



< 



4(2s)9[l + K-x] 
1 - 2x 



p+-L,i[Bx]j) 



This bound combines with (A. 54), the Holder inequality and the relation Li{[Bx]j) = Li(Bx — [BxY) to 
imply Km. I 
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A.3 Proof of Theorem 

Let us prove (i). Let us fix x G M", u G U and ^, and let us set ?? = u + ^, x = XpeniAx + ??). Let also 
/ C {1, K} be the set of indices of the s largest in magnitude blocks in Bx, J be the complement of / in 
{1, ...,K}, and ioi w G W let wj, wj be the vectors obtained from w by zeroing out all blocks with indexes 
not in I, respectively, not in J. Finally, let z = x — x and v = ||-ff"^?7||. 
1°. We have 

Li{Bx) + \\\H^{Ax - - 7?)|| < Li{Bx) + X\\H^r]\\ 



and 
whence 
We have 



\\H'^ {Ax - Ax - r])\\ = \\H'^{Az-7])\\ > \\H'^Az\\ - \\H^r]\\, 
Li{Bx) + X\\H^Az\\ < Li{Bx) + 2X\\H^r]\\ = Li{Bx) + 2\v. 



Li{Bx) = Li{Bx + Bz) = Li{[Bx]i + [Bz]j) + Li{[Bx]j + [Bz]j) 
> Li{[Bx]i) - Li{[Bz]i) + Li{[Bz]j) - Li{[Bx]j), 

which combines with ( A.56| ) to imply that 

Li{[Bx]i) - Li{[Bz]i) + L^{[Bz]j) - Li{[Bx]j) + X\\H^Az\\ < Li{Bx) + 2Xu, 
or, which is the same, 

Li{[Bz\j) - Li{[Bz]i) + X\\H'^Az\\ < 2Li{[Bx]j) + 2Xu. 
Since || • ||) satisfies Qs,i(x), we have 

Li{[Bz]i) < Ls,i{Bz) < s\\H^Az\\+ xLiiBz), 

so that 

(1 - >i)Li{[Bz]j) - xLi{[Bz\j) - s\\H^Az\\ < 0. 



Taking weighted sum of (A. 57) and (A. 58), the weights being 1, 2, respectively, we get 

(1 - 2x) [Li{[Bz]i) + Li{[Bz],j)] + (A - 2s)\\H^ Az\\ < 2Li{[Bx]j) + 2Xv, 
that is (since A > 2s), 

2Ai/ + 2Li([Sx]j) 



LxiBz) < 



1 - 2x 



O \ 1 



Further, by (A.56) we have 

X\\H^Az\\ < Li{Bx) - Li{Bx) + 2Xu < Li{Bz) + 2Az^, 



which combines with (A. 59) to imply that 



1 - 2k 



1 - 2x 



From Q,s,q{i^) it follows that 



(A.56) 



(A.57) 



(A.58) 



(A.59) 



(A.60) 
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which combines with (A. 60) and (A. 59) to imply that 



Ls,q{Bz) < SI [s\\H^Az\\+ kLi{Bz)] < s 



---1 



4su{l->c) + fL,{[Bx]j) k[2Xu + 2Li{[Bx]j)] 



1 - 2x 



1 - 2x 



< 4- 



1 - x) + 2s-iAH + 2[A~i + s~^k\Li{[Bx]j) 



1 

S9 



1 - 2x 



IH X 

2s 



1 - 2x 



(A.61) 



(recah that A > 2s, k > x, and x < 1/2). It remains to repeat the reasoning fohowing (A. 55) in item 3 
of the proof of Theorem 3.2 SpecificaUy, denoting by 6 the (s + l)-st largest magnitude of blocks of Bz, 

(A.62) 



(A.61) implies that 



e < s-^/^Ls.g{Bz) < 4[1 + K^-x][iy+ ^Li{[Bx]j)], 

2s 2s 



so that for the vector w = Bz — [BzY one has 



L,{w) < e'-'^L.iwy. <'-^[l + Ki-^-x]'^' [u+l^L,{[Bx]j 



q-l 



(we have used (A.62), (A. 59) and the fact that A > 2s). Hence, taking into account that [Bz^ and w have 
non-intersecting supports, 

Lq{Bz) < 29 max[Lg{[BzY), Lg{w)] = 2^ max[Ls,q{Bz), Lq{w)] 



< 



4A5 
1 - 2x 



A 

1 + K X 

2s 



u + -L,i[Bx]j) 
2s 



(we have used (A.61)). This combines with (A. 59) and Holder inequality to imply (3.16). All remaining 
claims of Theorem 3.3 are immediate corollaries of (3.16). I 



A. 4 Proof of Proposition 4.1 



(i): Let H G M."^^^^ , \\ ■ ||, p satisfy (!). Then for every k < K and every 1 < i < nj, denoting by Wki i-th 
entry in tt;[/c], w G W, we have 

\[Bx]ki\ < \\H'^Ax\\ +s^^kLi{Bx), 
or, which is the same by homogeneity, 

minjllF'^^xll - [Bx]ki : Li{Bx) < l| > -s'^k. 
In other words, the optimal value Opt^j of the conic optimization problem 

Optfci = min \t - [e^'\^Bx : \\H'^Ax\\ < t, Li{Bx) < l] , 

x,t I J 

where e'^* € W is the vector with the only nonzero entry, equal to 1, placed at i-th. position of the k-th block, 
is > —s~^K. Since the problem clearly is strictly feasible, this is the same as to say that the dual problem 

max : A'^Hrj + B^g = B^ \W]\\i <fi,l<^<K, ML < l| , 



where || • ||* is the norm conjugate to || • ||, has a feasible solution with the value of the objective > —s~^k. 
It follows that there exists i] = rf^"^ and g = g^'^ such that 

(a) : B'^e''' = A^h^^ + B^ g^\ (b) : h''' := Ht]''\ ||r/^^||, < 1, (c) : < s'^k, 1 < i < K. (A.63) 
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Denoting by /i* the i-th column in the mx A^-matrix [h^'^, /i^'^^, h'^'^, h"^'""^, h^'^, Ji^^^], defining 



V^^ as the Uk x n£ matrix with the rows {g 



ki \ 



1, ...,nfe, and setting V = (A.63 a,c) ensure 



the validity of (4.202,6) (recall that ||M|| 

00,00 is the maximum of || • ||i-norms of the rows in M ) . Besides, 



by (A.63 6) and the definition of H (see (!)) we have 



where the implication is due to the fact that |[/i'^*]"^Cl = Wt^^^]^ H'^C\ ^ ll-f^"^CII for all Q because of 
II??''* II* < 1, and the implication =>b is due to the fact that U is symmetric w.r.t. the origin. We conclude 



that H C and thus ^(5"*") > P(H) > 1 — e, as re quire d in (4.20 c). (i) is proved. 

(ii): Let H = h^]^V = [V^^]^^^^, p satisfy (|4 20|). Then for every j; G we have 



w:= Bx = VBx + H'^Ax = Vw + H'^Ax, 



whence w[k] = J2f=i V^^w[£] + v[k], so that 



K 



IkWII(fc) = lk[^]l|oo < ll^''^lloo,oo|kM||oo + IbWIloo < S \Li{w) + II^^Axlloo. 

£=1 

We conclude that 

Ls,oo{Bx) = max ||tt;[A;] ll(fc) < \\H'^ Ax\\oo + s^^ kLi{Bx) 
for all X, meaning that {H, \\ ■ ||oo) satisfies Qs,oo('t)- Further, we have 

H := ^i:\\H'^[u + i]\\^<p^u(^U^ = {i:\[h'f[u + i]\<p^u(^U,yi<N] 



whence P{E) = P(H+) > 1 - e. Thus, H 



' II lloo 



, p satisfy (!) 



A. 5 Proof of Proposition 4.2 



Notation. Let 1 < k < K and 1 < « < ra^. For a vector w G W, we set [w]ki to be z-th coordinate in w[k]. 
For a vector u G M"'', we set ||u||^ = maxjyj |Mj|, with the convention that the latter maximum is when 
= 1. Further, let e^^ be the vector from W such that [e^^]£j = 1 when £ = k and j = i and [e^^jij = for 
all remaining pairs £,j. Finally, let B = [B^; with x n matrices B^. 

1°. Let us fix A;, z, 1 < A; < K, 1 < i < Uk, let M = aS, and let a > a + S~~^M = 2a. We set 

X'l' = {x G M" : [Bf^xji = a, \\B''x\\i^ + J^i^k \\B^x\\oo < M}, X'l' = -Xf, 

= AX'l\ = AX'i' = -Yf\ 

We denote V = 2W + 2 {D-q : \\r]\\2 < Erfinv(e)}. 

It may happen that X^ = 0. This is exactly the same as to say that the optimal value in the strictly 
feasible conic optimization problem 



max < le'^*"''^ 



Bx : ll^'^xll^ + \\B^x\\^ < M 

i^k 
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is < a, meaning that the optimal value in the dual problem 

min [ut : [v^ = 0, J2^BYv[e] = B'^ max < t\ 

v&A!,t ^-^ i<e<K 

is < a, whence there exists v''^ G W such that [u'^^j = 0, B'^v^'^ = B'^e^'^ and M max ||f ^*[^] ||i < a, that 
is, max < a/M. Thus, when is empty, setting h^^ = G M™, we get vectors h'^'^ G and 

,ki 



v""- e W such that 



(aki) B'^v^^ + Ah^^ = B'^e^^ 

(bki) [v^%i = 0, max < aM-\ 

l<i<K 

(cki) maxu'^h^' + Erfinv(e) ||Z)^/i'^'^||2 < a. 



(A.64) 



2*^. Assume now that ^ 0. Then Y^^ are nonempty convex sets. We claim that whenever < 9 < 1, 
the convex compact set 9V does not intersect the convex set 2Y^\ Indeed, if the opposite is true, there 
exists V and e, ||e||2 < Erfinv(e), such that 0{v + De) = Az with z G X^ . Now consider two hypotheses 
on the mean /i G M"* of the distribution of a Gaussian vector C, ~ ■f^ifJ-, DD^): 

H+: i_L = 9De, and H^: = -9De. 

Let us consider the following procedure for distinguishing between these two hypotheses: given ^, we compute 
x(C) and accept when [Bx{()]ki > 0, otherwise we accept H^. We claim that this procedure rejects the 
true hypothesis with probability < e. Indeed, applying (4.26) to u = —9v and x = z, we get 

Prob^^^(o,/„){||S[z - x{Az -9v + Dr])]\\^ <a + S'^LiiBz - [Bzf)} > 1 - e. 

Since Az = 9v + 9De and Li{Bz - [Bzf) < Y^t^k II^^^IU < M, we get 

a + S-^Li{Bz - [Bzf) <a + S~^M = 2a, 

while [S^^Jfcj = a > 2a. It follows that if rj is such that 

\\B[z - x{Az -9v + Drj)]\\oo <a + S-^Li{Bz - [Bzf), 

then a - [Bx{9De + D'r])]ki < 2a, whence [Bx{9De + Dr])\ki > 0. We see that 

Prob^^A^(o,/„){[55?(eDe + Di^)]ki > 0} > 1 - e, 

that is, our rule for distinguishing between and H- rejects i/+ when this hypothesis is true with 
probability < e. Similarly, applying (A.2Q) to u = 9v and x = — z, we get 

Vto\^M(^qj^){\\B[-z -x{-Az + 9v + D7^)]\U <a + S-^Li{Bz-[Bzf)] > 1 - e. 

Since —Az = —9v — 9De, we, same as above, observe that 

\\B[-z - x{-Az + 9v + Dt])]\\oo <a + S-^Li{Bz - [Bzf) 

implies that [Bx{—9De + Dri)]ki < 0, and thus 

Prob^^^(o,/„,){[5£(-eDe + Di])]ki < 0} > 1 - e 

that is, the probability to reject if_ when the hypothesis is true is < e. On the other hand, to distinguish 
between the hypotheses H± via observation is exactly the same as to distinguish between the distributions 
M{—9e,Im) and M{9e,Im)', to do it with probabilities < e to reject the true distribution is possible only 
when ||^e||2 > Erfinv(e), which is not the case due to ||e||2 < Erfinv(e) and < ^ < 1. The resulting 
contradiction demonstrates that 9V does not intersect 21^*. 
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lO 



Since 9V does not intersect 2Yj^^ when 9 < 1, the sets V and 2Y_f_^ can be separated by a hnear form, 
which can be normahzed to be > 2 on 2y_^* and < 2 on V (recah that G intV). In other words, there 



exists g = g^^ G 
amounts to 



such that maxg v < 2 and inf g y > 2. RecaUing the origin of V, the first relation 



maxu 5 + Erfinv(e)||D 5II2 < 1, 



(A.65) 



while the relation g^y > 2 for all y G 2Y^'^ = 2AX^ amounts to f^x > 1 for all x G where / = A^g. 
Recalling the definition of X'f!, it follows that 



min { fx : {e'^YSx = a, {{B'^xWi^ + Yl 

£^k 



< My > 1. 



Passing to the dual problem, the latter inequality results in 

3(tGM,2/GW): f = B^y, ayu-Mt>l, Y^\ykj\ <t, max ||y[£] ||i < 

For the above t, y we have < t < {ay^i — ^)/M, so that y^i > 0; setting 

0, 



(A.66) 



ki 5 



i = k, j = i 
otherwise 



(A.66) combines with / = A^g to imply that e^^ = B'^v^^ + Ah^^. Recall that, by construction, we have 



\f%i = 0. Further, by (A.66) we have ||t;^*M||i < t/yti < a/M, so that v^\h''^ satisfy (A.64 (aki),{bki)) 



Finally, by (A.66) we have < l/yui < a, which combines with (A.65) to imply (A.64 {cki))- 



4^. The bottom line here is that for every a > 2a, k,l < k < K, and every i, 1 < i < n^, there exist 
vectors h^^ G M"^ and v^* G V satisfying (A.64). It immediately follows that (A.64) can be satisfied when 
a = 2a as well. Assembling the corresponding h^^ and v^^ into the matrices 



we get (4.27) as an immediate consequence of (A.64) with a set to 2a 



A. 6 Proofs for section 15.11 



Proof of Proposition [53] Let = [y"; F^^], 1 < i < K,he the "stripes" of V. Given x G 
setting w = Bx, and using the relation (5.28), we have 

w = Bx = [VB + H'^A]x = Vw + H'^Ax, 



whence, 



= Ls,, (jZv^'^^l]^ +sh^{H^Ax) 

K 

< \\[Ww[e]\\^,y, W'w[i]\\^K)]\\s,g + sh^iH^Ax) 

K 

^ 5^<,(^)lkMllw +s^ioo(^''^x) = K,<,{y)Liiw) + sh^{H''Ax) 
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proving (5.29). 



Proof of Proposition 5.2 To verify (5.32), note that for every k and every we have 

o<||F'=^«;MII(fc)<lkMII(,) 



niax \\V = ||t«M||(^)||V' = \\w[t\\\(^e)^LM. 

«ieIR"i:||w)||(<)<l 



Since for any two nonnegative vectors, a,h satisfying Oj < hi Vi, we have ||a||s,g < ||^||s,<j) we get 

\\[\\[v''wmiiy,.-A\\v'''wmiK)]\^^^^ 

By taking the maximum of both sides first with respect to w[P\ subject to the constraint that ||?^[^]||(^) < 1, 



and then over 1 < ^ < K, we arrive at (5.32) implying that v*{V) < Us^q{V). 



A. 7 Derivations for section 15.21 

Derivation of Sk,<j in aU three cases (i.e., r = oo, r = 2, r = 1) is quite straightforward — we just plug into 



(5.33) the description of i's,q{-)- Let us focus on the derivation of R^. 



The case of r = oo. In this case, Loo{H (u + ^)) = \\H (u + Dr])\\oo, so that the requirement 



amounts to 



max i\\E^h''^\2 + \ri'^D'^h''^\\ < p. 

l<t<n,f, L J 



(A.67) 



(A.68) 



The condition (5.35 R^) is a natural sufficient condition for (A.68) to be satisfied with probability > 1 — e 



when 7] ~ J\f{0,lm) (cf the derivations insection 4.1). 



The case of r = 2. Here a slightly conservative (tight within factor 2) sufficient condition for (A.67) reads 

max^{\\E'^H[k]\\2 + \\H'^[k]Dr]\\2} < p. (A.69) 



A simple sufficient condition for (A.69) to be satisfied with probability > 1 — e when rj ~ M{0, Im) is for the 
probability of the event 

\: \\E^H[k]\\2 + ^rfD'^H[k]HT[k]Dr] < p| (A.70) 

to be > 1 — e/iiT for every k < K. For a given k, invoking the Schur Complement Lemma, this condition is 
the same as the existence of a symmetric matrix Wk and a real such that 



Wk D^Hik] 
H^[k]D akin. 



^ & \\E'^H[k]\\2 + ak<p&: Fro\^^^oM{r] : rj' Wkr] < ak} > I - e/K. (A.71) 



By O Proposition 4.5.10], for every symmetric m x m matrix W, setting A = X{W), one has 



Vx > : Prob^^Ar(o,/„){^^VFr? > Tr{W) + x||A||2} < exp 



x2||A| 



4(2||A||2 + ||A|Ux) 

The smallest x > for which the right hand side of this inequality is < e /K satisfies 

K 



x||A||2 = 25||A|U + ^4||A||2, + 8||A||i<5, 6 = ln 
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and we conclude that the condition 

TriWk) + 2S\\X{Wk)\\oo + ^J^X{Wk)\\l + 86\\XiWk)\\l < a, 



is sufficient for the vahdity of the third condition in (A.71). This observation imphes straightforwardly that 
the condition (5.36 R^) indeed is sufficient for the validity of (A. 69) with probability > 1 — e, which is all 



we need to prove. 



The case of r = 1. Here a slightly conservative (tight within factor 2) sufficient condition for (A.67) reads 

max i max \\H'^\k]Ev\\i + \\H'^\k]Drj\\i \ < p. (A.72) 
l<k<K t||-u||2<l j 

A natural way to enforce the validity of this condition with probability > 1 — e when rj ~ A/'(0, Im) is to 
ensure that for every k < K we have 



Prob,^^,(0,7„){r? : ||i/^[A:]ii;||2,i + \\H^[k]Dv\\i < p} > I - e/K. 



(A.73) 



A natural upper bound on the 1 — e/X-quantile of [A;]-D77||i is Erffiw \2Knk 
the relation 



|iJ^[fc]£;||2,i +Erfinv 



2Knk 



<P- 



Et=i II ^^^''^ II 2, so that 



(A.74) 



The latter constraint, while convex in variables H,p, is intractable, since the norm ||//^[A;]£'||2,i is difficult 
to compute. This norm, however, admits a tight within the factor v/7r/2 ~ 1.29 efficiently computable upper 
bound on || • ||2,i, due to Yu. Nesterov (see \S2\ Theorem 13.2.4] )rl namely, for every G G MP^'^ one has 



1^112,1 < min S^y^Aj + ^^i 



Diag{A} G 



y 



(A.75) 



Replacing in (A.74) the quantity [i^]ii^||2,i with its upper bound, we arrive at (5.37) 



A. 8 Proof of Proposition 5.3 



Let H, V satisfy (5.38) with B = I, that is, V = In — H A, V is n x n, and H is m x n. Observe, ffist. 



that sd < m; indeed, otherwise, by dimension argument, we could find s-block-sparse signal x ^ such 
that Ax = 0, meaning that there is no way to recover x even from the exact observation of Ax and thus 
our sufficient condition for the validity of ^i-recovery cannot hold true. Now let us set K = Ceil(2m/d) 
and n = Kd. Note that K < K due to n = Kd > 2m, whence n < n and of course h < 2m + d due to 
{K — l)d < 2m. Now let V = \V^^]^ ^^^^ be the n x n "North- Western" block oiV^Ahemxn matrix 
comprised of the first n columns of A, and H he mx n matrix comprised of the first n columns of H. Since 
V = In — H^^ A, we have V = In — H^A. Moreover, (5.38) implies that 



w 



1£\ 



w 



2£\ 



\s,q 



<a:-- 



< K. 



(A.76) 



Now, V 



H^A, and since the rank of H^A is < m, at least n — m singular values of V are > 1, 



and therefore the squared Frobenius norm ||V^||j^ of V is at least n — m. On the other hand, we can upper- 
bound this squared norm as follows. It is immediately seen that with the values of r in question, for every 



To see why the right-hand side of (A.75 1 is indeed an upper bound for ||G||2,ii tak e \\v \\2 < 1, set g = Gv and note that 
by the Schur Complement Lemma for A, ^ feasible for the right hand side problem in (A.75 1 we have X^iS?/-^* — l^-v'^v < fj,; 



noting that [X^. \gi\] < [J2i Ai] [J2i9i/^i] whenever A, > , we conclude that ||p||i < s/Jj^Zi^i < + A,. Recalling 
the origin of g, we conclude that the right hand side in (A.75 1 indeed is an upper bound on ||G'||2,i- 
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d X d block y'^^ in V we have WV^^Wf < Vd\\V'"^\\r,r- Since for every dimensional vector / one has 



rkl 



< max 



K 



s,q 1 



we now get 



K K 



n — m 



< \\v\\i = EE\\y'r.<dY.Y.\\^ 

K 



K 



< ci y max 



1 k=l 

K 



K K K 

'''lr = dY.\\f\\l 
1 k=l 1=1 



llf IIL < max 



,1 



dKa^ = -nmax 
4 



,-1- -2 --2 

d ns ,si 



We conclude that either 



or 



^-2 



SI 



> 



4(n 



m] 



n 



n 

2 



4d{n 



m) 



(a) 



(&) 



Since n > 2m, the right hand side in (a) is > 2; thus, (a) is impossible. Taking into account that, as we 
have seen, n — m > m and n < 2m + d < 3m, (6) implies ( |5.39 ). I 



A. 9 Proof of Proposition 5.4 



1*^. The proof of (i) is immediate. Indeed, let us look at the blocks V^^ in the matrix V 



k = i, we have V^'^ 
we have V''^ 



(1 



At+l' 



by definition of Cjt, whence ||2,2 



^CZ^A^ikjAli], that is, \\V''^\ 



I-H^A. When 
When A; / ^, 



Hi 



2,2 



< 



1- 



fi+l^k 

and thus, in the notation of Proposition 



2,2 < by the definition of fi. The bottom line is that 



implies the validity of (5.33), which is all we need. 



5.1 



m < 



l+At' 



We see that (5.44) indeed 



2^. Let us justify the bound on the mutual block-incoherence stated in (ii). Specifically, let Uk = d, 



1 < A ; < K, let ^ be a matrix with unit columns, let u, ij,b be given by (5.41), and let the first relation in 
(5.42) be satisfied; we want to prove that the mutual block-incoherence oi A is < fi = j^^^jy^- This is 
immediate: since the columns of A have unit ^2-norms, the diagonal entries in the d x d symmetric matrix 
Ck = A'^[k]A[k] are equal to 1, while the moduli of the off-diagonal entries in Ck do not exceed u by the 
definition of ly. It immediately follows t hat t he minimal eigenvalue of is > 1 — (d — l)i/, and thus Ck is 
positive definite (by the first relation in (5.42)) and fTmax(C'^"^) < [^~{d—^)i^] 
iC^'A^[k]A[£])<a, 



we have 



.iC^^,M''[k]A[i]) < (1 - (d - 1)Z.)-V: 



^. Therefore, whenever k 7^ 



where the concluding inequality is a consequence of the definition of fiB, see ( 5.41[ ). Recalling the definition 
of the mutual block-incoherence, we arrive at the desired conclusion. ■ 



A. 10 Proof of Proposition 5.5 



1*^ Let ^ ~ M{0, m ^Im) and r] ~ AA(0, m ^Im) be two independent normal vectors. Then for any < A < 
/m we have 



Prob<^ Irr/I > 



A 



< 2e"~; Prob<^ 1^?- 1| > 



A 



< 2e~^ 



(A.77) 
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Indeed, by the standard argument, we have for < t < ^Jm: hi ( E 

< 1*2^ and for < 



ln(l- 



We conclude 



that when t'^ <f,ln{E 



A 



9m 
8 



Prob{|C' ??| > —=} = 2Prob{^^r? > 



Further, we have for t < 

In (^E [e*v^(^^«-i) 
In [e*v^(i-«^«) 
Consequently, when A < \/rn, 

Prob{|C^^ 



A , 3+2 

} < 2 min e* 

\t\<\/m/2 



m 



ln(l 



2t 



-^ln(l + 



'm 
2t 



+ t^/m < r. 



m 



1| > 



A 



} < 2 min e 

m |t|<\/m/4 



2t^-\t 



2e~^, 



and we arrive at (A. 77). 



2" Now, let 5 = {li G M : ||'u||2 = 1}, and let T>^ be a minimal e-net, w.r.t. || • ||2, in S, and let be 
the cardinality of P^. Further, let 1 < A: < X and let A\€\ and A[A;] be the submatrices of A comprised of 
columns of A with indices in the £-th and the A:-th blocks. Finally, let = A^\i\A{l\ = A[e\^ A[e\ - Id-, 
and let x, y G 5" be deterministic. When invoking (A. 77) for ^ = and r/ = j4[A;]y we conclude that 



Prob<{ \x^ A[lY A[k]y\ > 
We claim that 



A 



<2e-^,ki^t, Prob<' |x^C7^x| > 



A 



< 2e~^. 



\\v'^C'v\ < A Vt; G P,} ^ {<7max(C^) < (1 " ^e)'^ ^ } 
et tl 



(A.78) 



(A.79) 



where crinax(') is the spectral norm. Indeed, let the premise in (A.79) hold true. Recall that is symmetric, 
so let tZ G 5 be such that \u^C^u\ = o"max(C'^)- There exists v ^ such that \\u — v\\2 < e, whence 
'7max(C'^) = \u^C^u\ < 2c7niax(C^) ||^^ " v\\2 + \v'^C^v\ < 2cTniax(C^)e + A (since the quadratic form z^C^z 



is Lipschitz continuous, with constant 2(7max {C) w.r.t. 
similar reasons when k ^ I we, have 



on S), whence (Jmax(C^) < (1 - 2e)-^A. By 



{i;^A[^]^^[A;]t>' < A V^;,^;' G V,] {(Jr,,^^{A[lf A[k]) < (1 - 2e)-iA } 

Indeed, let u, u' ^ S be such that u-^^[£]-^A[A;]ti' = cjmax(^[^]^^[fc])) and if v, v' G satisfy \\u 
\\u' — v'\\2 < e, then 

u^A[efA[k]u' < v'^ A[if A[k]v' + a„ 



(A.80) 



,{A[efA[k])i\\u - v\\2 + \\u' - v'h) < A + 2am.AA[ifm)^- 

3^ We can straightforwardly build an e-net V in S in such a way that the || • ||2-distance between every 
two distinct points of the net is > e, so that the balls By = {z G M'^ : Hz — t>||2 < e/2} with v & V 
are mutually disjoint. Since the union of these balls belongs to = {z G : ||2;||2 < 1 + e/2}, we get 
Card(P')(e/2)'^ < (1 + that is, < Card(P') < (1 + 2/ e)'^. 

Let us set e = j. When using the second inequality of (A.78), we conclude that the probability of 
violating the premise in (A.79) with A = 2Am^^/2 and A < ^/rn does not exceed 2 exp{ — ^+(iln[l+2e^^]} = 
2exp{ — ^ + (iln[9]}. Setting A = -y/m/4, we conclude that when m > 1281n(4n2/d) + 256dln(3), one has 

1 



Prob <^ max a^a^AC -Id)> k> < 



1<E<K 



2n 
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that is, with probabihty at least 1 — all the matrices -\- Id = A^[£]A[£] =: are invertible with 

maXi<i<K 0"max(G^]"^) < 2. 

In the same way, using (A. 80) and the first inequality of (A. 78) we obtain 

A := J3lnUn^/d^) + 6d\n(3) < ^ Prob \ arm.J[A^[k]A[i]) < ^ VA; / £ 1 > 1 - — . 

[ ^/m } 2n 

We conclude that for properly chosen absolute constants Ci,C2 under the condition m > Ci max[d, ln(n)] 
we have 



Prob 



a^^^iA'^ikjAil]) < C2V[d + Hn)]/myk + I, a^.,^{{A^{l\A\l\)-^) < 2^} > 1 - -, 

J n 



so that by the definition of the mutual block-incoherence 



Prob{^ : fi{A) < C2\/[d + ln{n)]/m} > 1 



n 



A. 11 Proof of Lemma 16.11 



The proof below follows the lines of the proof of Proposition 10 of [22j. Let H"*" be given by (6.46 c). Let 
us fix ^ G H"^, u ^ U and x E M" such that Li{Bx — [BxY) < v, and let rj = u + S,, so that Loo{H^ rj) < p 
due to G We will proceed by induction. First, let us show that (afc_i,6fc_i) implies (a^jbi^). Thus, 
assume that {ak-i,bk-i) holds true. Let z^'^'^^ = x — v^''~^\ With g and A as defined at k-th step of the 
algorithm, we have 



Bz^^'^'^-g = Bx - Bv^^-^'^ - [y - Av^^-^^) 
= VBz^''-^^ - H^T], 



B-H^ A){x-v 



by (6.46 a). Now, due to (6.46 b) and the fact that L^{H'^'q) < p by (6.46 



c), we have 

Loo(Sz('=-i) -g)< L^{VBz^''-^y) + L^{H^r,) < -fLi{Bz'^^-'^) + p< 70^.1 + p := i^, 



(A.81) 



where the last inequality is due to (bk-i). Let us fix j £ {1, ...,K}; let us consider the case || • = || • ||2. 
Observe that A[j] is the closest to point of the Euclidean ball Sj = {w £ : — g[j]\\2 < i^}- By 
the properties of the Euclidean projection, for any w £ Sj, \\w — A[j]||2 < ||w||2, and, since by (A.81) 



G Sj, we have 



||(i3zW)[i]||2 = \\{Bz^'-'^-A)[j]\\2 < ||(i?z(^-^))[j]||2 < WiBa 



2, 



(A.82) 



we note that Aji is the 



since by (ofc^i), \\{Bz^'' "'"^)[j]||2 < ll(-^^)[j]||2- In the case where 
closest to point of the interval Sji = [gji — i^,gji + u], i = 1, ...,nj, and by (A.81), the segments Sji cover 
{Bz^-^)ji. We conclude that Aji £ ConvjO, {Bz^''~^'>)ji} , so that 

||(i?z(^))[j]|U = - A)[i]|U < \\{Bz('-'^)[j]\\^ < \\{Bx)[j]l 



where the last inequality is by (afc_i). Together with (A.82) this implies (ofc). Further, we have by (A.81): 

Loo(SzW) = L^iBz^''-^^ -A)<2u = 2^ak-i + 2p. 

Let now I C {1, ■■■,K} be the set of indexes of the s largest in magnitude blocks in Bx, and J = {1, . . . , K}\I. 

Now by (ofc), 

= L^{[Bz^%) + L^{[Bz(%)<Y,\\{Bz^'^)[j]h^ 
< sLoo{Bz^^''^^ - A) + v<2sv + v = ak. 
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we conclude that (6^) holds, what completes the induction step. 

It remains to show that (ao,&o) holds true. Since (ao) is evident, all we need is to justify (bo). Let 

a* = Li{Bx), 



and let g = H y. Same as above (cf. (A. 81)), we have for all i: 



Loo{Bx - g) < 7a* + p. 



Then 



a, = Li{Bx) = Li{[Bx\i) + Li{Bx - [BxY) 

< +loi*+ P]+v < Ls,i{g) + s^a^ + sp + 



V. 



Hence 



what implies (60) • 



Ls,i{g) + sp + v 
1 — 57 
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